fix(testgen): kill the whole subprocess tree on tg start/install shutdown#88
Merged
Conversation
…down `proc.terminate()` on Windows calls `TerminateProcess` and only kills the parent — pixeltable-pgserver's `postgres.exe` children get orphaned. The symptom: `tg delete` fails with "Could not remove TestGen data directory" because postgres holds the data files open. New `stop_app_tree(proc, timeout=...)` walks the descendant tree: - Windows: shells out to `taskkill /F /T /PID` to walk the WMI process tree. - POSIX: sends SIGTERM to the process group (parent is now spawned with `start_new_session=True`), escalating to SIGKILL on timeout. Routed all three termination sites in `start_testgen_app` through the helper (port-readiness timeout, KeyboardInterrupt handler, finally cleanup). Added unit tests covering the platform branching and the SIGKILL escalation path. Note: this is a force-kill safety net for uncatchable termination paths (installer crash, terminal X-button close, hang). Graceful shutdown on Ctrl+C is a separate fix in pixeltable-pgserver / testgen's signal handling — see the issue we drafted for the testgen team. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GitHub's Linux runner runs the test container as root. Existing Popen-mocked tests (like ``test_start_testgen_app_happy_path``) routed through ``stop_app_tree`` in the new ``finally`` block, where ``os.killpg(os.getpgid(proc.pid), SIGTERM)`` resolved to ``pgid 1`` (``MagicMock.__index__`` defaults to 1) and actually signalled init — killing the container and producing "The runner has received a shutdown signal". macOS swallowed the same call as ``EPERM``; CI didn't. Adds an autouse fixture that patches ``os.killpg`` and ``os.getpgid`` so no test can accidentally signal a real process group. The 4 unit tests for ``stop_app_tree`` already wrap their own ``patch`` over these and keep working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
168a27e to
32d5775
Compare
Previously the installer only verified that the docker CLI and engine were available. A user with the CLI but no Compose plugin (e.g. distro docker.io without docker-compose-plugin) would pass the picker's prereq display, then fail later inside ComposeStartStep / docker compose exec. Adds REQ_DOCKER_COMPOSE (probes docker compose version) and threads it through every action that shells out to docker compose — Obs install/upgrade/delete and the Docker-mode TestGen actions (install/upgrade/start/delete/run-demo/delete-demo). DemoContainerAction and the pip+--export branch of tg run-demo keep the original Docker-only list since those paths only use docker run. The auto-mode picker hides REQ_DOCKER from its prereq line — its failure is implied by REQ_DOCKER_COMPOSE failing, and a precise per-prereq fix message still fires at command-line invocation time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rboni-dk
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
proc.terminate()on Windows callsTerminateProcess, which kills only the parent and leaves pixeltable-pgserver'spostgres.exechildren running. The visible symptom istg deletefailing with "Could not remove TestGen data directory" because postgres holds the data files open.stop_app_tree(proc, timeout=...)helper walks the descendant tree:taskkill /F /T /PIDon Windows,os.killpg(SIGTERM)(with SIGKILL escalation) on POSIX.start_testgen_appnow spawns testgen withstart_new_session=Trueon POSIX so the whole group can be signaled together. All three termination sites (port-readiness timeout,KeyboardInterrupthandler,finallycleanup) route through the new helper.Scope note
This is a force-kill safety net for uncatchable termination paths: installer crash, terminal closed via the X button, hang in testgen, port timeout. Graceful shutdown on Ctrl+C is a complementary fix in pixeltable-pgserver / testgen's own signal handling — that change is happening separately on the TestGen side and doesn't make this one redundant. The two fixes cover different failure modes.
Also in this PR: explicit Docker Compose prereq check
Previously a machine with the
dockerCLI + a running engine but no Compose plugin (e.g. distrodocker.iowithoutdocker-compose-plugin) would pass the auto-mode picker's prereq display, then fail later insideComposeStartStep/docker compose exec.REQ_DOCKER_COMPOSE(probesdocker compose version).docker compose: Obs install/upgrade/delete and the Docker-mode TestGen actions (install/upgrade/start/delete/run-demo/delete-demo).DemoContainerActionand the pip+--exportbranch oftg run-demokeep the original Docker-only list since those paths only usedocker run.REQ_DOCKERfrom its rendered prereq line (Docker installedis implied byDocker Compose installed) — purely cosmetic; the underlying check still runs so atg install --dockerinvocation on a Docker-less box still gets the precise "Install Docker and try again" message.Test plan
pytest— 182 passing locally (4 new unit tests for the kill-tree helper covering: no-op when proc already exited, Windows taskkill path, POSIX killpg path, SIGKILL escalation on timeout)ruff check+ruff format --checkcleanGet-Process postgres -ErrorAction SilentlyContinue(some orphans expected until the testgen-side graceful shutdown lands; this PR ensures the installer's own unilateral termination paths don't leak)docker.iobut nodocker-compose-plugin:tg installshould now fail prereq check with "Install the Docker Compose plugin and try again" instead of crashing inside the install steps🤖 Generated with Claude Code