Skip to content

Fix Windows E2E hangs: restore per-command CLI shutdown and isolate the daemon per home#4075

Open
gavande1 wants to merge 8 commits into
trunkfrom
combine-cli-shutdown-fix-and-daemon-isolation
Open

Fix Windows E2E hangs: restore per-command CLI shutdown and isolate the daemon per home#4075
gavande1 wants to merge 8 commits into
trunkfrom
combine-cli-shutdown-fix-and-daemon-isolation

Conversation

@gavande1

@gavande1 gavande1 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Related issues

Proposed Changes

Windows E2E has been broken since June 29 by two independent regressions that merged the same day, which is why earlier single-PR reverts never fixed it:

  1. Extract the CLI spawn helper to @studio/common #3954 caused the 3-hour hangs. Its killAll() on will-quit runs one listener after the quit handler that spawns site stop --all, so the stop command is killed before it can stop any site. Sites leak into the machine-global process-manager daemon until its capacity cap is exhausted and every subsequent site creation times out. Fixed here by restoring per-command CLI shutdown handling (Test restoring per-command CLI shutdown handling #4070): each CLI child registers its own quit-time kill handler when spawned, so the quit-time stop survives. Verified on CI: quit-time stops complete in under a second (previously a 20-second timeout on every session) and zero capacity errors.

  2. Fix native PHP sites loading assets from remote siteurl/home (STU-1925) #3988 caused the ~25 test failures. It made every native-PHP site start pass an auto_prepend_file (the site-url prepend that loads reprint's runtime, including the SQLite loader) as an unquoted -d directive. On CI Windows agents the file lives under an 8.3 short path (C:\Users\BUILDK~1\...), and the unquoted ~ fails PHP's INI parsing (syntax error, unexpected '~' in Unknown on line 4 — visible in the daemon logs this PR now uploads). PHP drops the directive, WordPress boots without its SQLite driver, and every page renders an instant database error — which is why browser-driven tests saw non-WordPress pages while the app's own UI worked. Fixed by quoting the value like every other path directive. This never reproduced on developer machines because short usernames don't get 8.3-mangled.

Also included:

  • Isolate the process-manager daemon per home on Windows (Isolate the process-manager daemon per home on Windows #4061): the daemon pipe is scoped per home directory instead of machine-global, so even if a session leaks it cannot poison other sessions or other builds on the same agent. Includes that branch's daemon-log diagnostics (adjusted here to keep the daemon home in a short tmp path — the original diagnostic placed the daemon's Unix socket under a long test-results path, which exceeds the macOS/Linux socket path limit and broke every site creation with connect EINVAL; logs are now copied into test-results at cleanup instead) and the PHP-binary re-download fix.
  • CI reporting fixes: one combined "E2E Tests" GitHub status for all platforms (previously last-writer-wins let a green mac job mask a failing Windows job), a 100-minute job timeout, a cancellation trap so a timed-out job cannot record exit status 0 and turn the build green, and per-test/per-action log streaming with failure screenshots and daemon logs as artifacts.

Testing Instructions

  • CI (Windows E2E): the job should finish well under the 100-minute cap with no site stop --all command timed out lines, no CAPACITY_LIMIT_REACHED errors, and no syntax error, unexpected '~' lines in the daemon logs. Browser-driven tests (blueprints, wp-admin shortcuts, homepage) should pass.
  • GitHub status: the "E2E Tests" context stays pending until mac, Windows, and Linux all finish, then reports one combined result.
  • Unit: npm test -- apps/cli/tests/ (817 tests pass).

Pre-merge Checklist

  • Have you checked for TypeScript, React or other console errors?

@gavande1 gavande1 requested a review from a team as a code owner July 3, 2026 11:47
gcsecsey and others added 6 commits July 3, 2026 17:20
On Windows the daemon's control/events sockets were hardcoded named pipes, so every
process shared one machine-global daemon regardless of STUDIO_PROCESS_MANAGER_HOME —
silently defeating the per-run isolation that already works on macOS/Linux and that the
CLI test harness relies on. Derive the pipe name from the home when a custom one is set;
the default home keeps its original fixed name so the desktop app and CLI still share a
single daemon exactly as before.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vfk7xXMfsABh5JMYX51wiS
The build skipped the PHP download whenever apps/studio/php-bin/<packageId>/ already
existed, so a package republished in place (same packageVersion, new checksum — e.g.
adding the Windows VC++ runtime DLLs) was never picked up on reused CI agents, which kept
bundling a stale php.exe. Record the archive SHA on install and re-download when it no
longer matches the expected SHA.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vfk7xXMfsABh5JMYX51wiS
Shorten stale-PHP log line to satisfy prettier

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vfk7xXMfsABh5JMYX51wiS
@
Temporary instrumentation to find why native PHP doesn't serve on Windows CI.
Routes the E2E process-manager daemon and its per-process logs under
test-results/daemon-logs so they upload as artifacts, and logs any unexpected
php.exe exit code (a Windows loader/DLL failure exits 3221225781 with no stderr).
Revert once diagnosed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Vfk7xXMfsABh5JMYX51wiS
@gavande1 gavande1 force-pushed the combine-cli-shutdown-fix-and-daemon-isolation branch from 05d1923 to 01bd9b3 Compare July 3, 2026 11:52

@gcsecsey gcsecsey left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this, I think it makes sense to land these fixes together, and continue investigating the failing tests separately. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants