Fix Windows E2E hangs: restore per-command CLI shutdown and isolate the daemon per home#4075
Open
gavande1 wants to merge 8 commits into
Open
Fix Windows E2E hangs: restore per-command CLI shutdown and isolate the daemon per home#4075gavande1 wants to merge 8 commits into
gavande1 wants to merge 8 commits into
Conversation
On Windows the daemon's control/events sockets were hardcoded named pipes, so every process shared one machine-global daemon regardless of STUDIO_PROCESS_MANAGER_HOME — silently defeating the per-run isolation that already works on macOS/Linux and that the CLI test harness relies on. Derive the pipe name from the home when a custom one is set; the default home keeps its original fixed name so the desktop app and CLI still share a single daemon exactly as before. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vfk7xXMfsABh5JMYX51wiS
The build skipped the PHP download whenever apps/studio/php-bin/<packageId>/ already existed, so a package republished in place (same packageVersion, new checksum — e.g. adding the Windows VC++ runtime DLLs) was never picked up on reused CI agents, which kept bundling a stale php.exe. Record the archive SHA on install and re-download when it no longer matches the expected SHA. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vfk7xXMfsABh5JMYX51wiS
Shorten stale-PHP log line to satisfy prettier Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vfk7xXMfsABh5JMYX51wiS @
Temporary instrumentation to find why native PHP doesn't serve on Windows CI. Routes the E2E process-manager daemon and its per-process logs under test-results/daemon-logs so they upload as artifacts, and logs any unexpected php.exe exit code (a Windows loader/DLL failure exits 3221225781 with no stderr). Revert once diagnosed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Vfk7xXMfsABh5JMYX51wiS
05d1923 to
01bd9b3
Compare
gcsecsey
approved these changes
Jul 3, 2026
gcsecsey
left a comment
Member
There was a problem hiding this comment.
Thanks for creating this, I think it makes sense to land these fixes together, and continue investigating the failing tests separately. 👍
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related issues
Proposed Changes
Windows E2E has been broken since June 29 by two independent regressions that merged the same day, which is why earlier single-PR reverts never fixed it:
Extract the CLI spawn helper to @studio/common #3954 caused the 3-hour hangs. Its
killAll()onwill-quitruns one listener after the quit handler that spawnssite stop --all, so the stop command is killed before it can stop any site. Sites leak into the machine-global process-manager daemon until its capacity cap is exhausted and every subsequent site creation times out. Fixed here by restoring per-command CLI shutdown handling (Test restoring per-command CLI shutdown handling #4070): each CLI child registers its own quit-time kill handler when spawned, so the quit-time stop survives. Verified on CI: quit-time stops complete in under a second (previously a 20-second timeout on every session) and zero capacity errors.Fix native PHP sites loading assets from remote siteurl/home (STU-1925) #3988 caused the ~25 test failures. It made every native-PHP site start pass an
auto_prepend_file(the site-url prepend that loads reprint's runtime, including the SQLite loader) as an unquoted-ddirective. On CI Windows agents the file lives under an 8.3 short path (C:\Users\BUILDK~1\...), and the unquoted~fails PHP's INI parsing (syntax error, unexpected '~' in Unknown on line 4— visible in the daemon logs this PR now uploads). PHP drops the directive, WordPress boots without its SQLite driver, and every page renders an instant database error — which is why browser-driven tests saw non-WordPress pages while the app's own UI worked. Fixed by quoting the value like every other path directive. This never reproduced on developer machines because short usernames don't get 8.3-mangled.Also included:
connect EINVAL; logs are now copied into test-results at cleanup instead) and the PHP-binary re-download fix.Testing Instructions
site stop --all command timed outlines, noCAPACITY_LIMIT_REACHEDerrors, and nosyntax error, unexpected '~'lines in the daemon logs. Browser-driven tests (blueprints, wp-admin shortcuts, homepage) should pass.npm test -- apps/cli/tests/(817 tests pass).Pre-merge Checklist