Skip to content

fix: stabilize Windows-flaky tests (lifecycle teardown race + egg-bin timeouts)#5978

Merged
killagu merged 5 commits into
nextfrom
fix/windows-flaky-teardown-race
Jun 20, 2026
Merged

fix: stabilize Windows-flaky tests (lifecycle teardown race + egg-bin timeouts)#5978
killagu merged 5 commits into
nextfrom
fix/windows-flaky-teardown-race

Conversation

@killagu

@killagu killagu commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Fixes two independent sources of Windows CI test flakiness on next.

1. fix(core): teardown close/load race (the @eggjs/session flake)

Symptom

Windows CI flakily failed @eggjs/session test/app/middleware/session.test.ts with app has been closed / Can't find viewEngine unhandled rejections (e.g. CI run 27204947537). The failing assertion points at an unrelated file — the signature of a process-level unhandled rejection leaking across files under vitest isolate: false (continuation of #5964).

Root cause

mm.app() loads an app and an agent, and load() runs as a registerBeforeStart hook on process.nextTick, so it can still be in flight when close() runs (slow Windows fs widens the race). Lifecycle.close() does not wait for the in-flight load, so it flips #isClosed while load() is still running. The still-loading code then reaches Lifecycle.registerBeforeClose() — directly in egg.ts load(), or lazily via coreLoggercreateLoggers() from dumpTiming / _unhandledRejectionHandler — which assert-threw app has been closed. Inside an async load / rejection handler that becomes a process-level unhandled rejection, which isolate: false attributes to whatever test file is currently running.

Fix

  • Lifecycle.registerBeforeClose() now refuses (returns false, no throw) when the app is closing or closed instead of asserting. The guard covers close in progress too (new #isClosing flag set at the top of close(), before the close-callback snapshot is taken), so a hook registered mid-close is not silently stranded.
  • EggApplicationCore.load() checks that return value: when registration is refused by a teardown race, it cleans up the resources it already created — the unhandledRejection listener, the messenger (IPC listeners) and any lazily-created loggers (file descriptors) — so none leak across files, then returns without loading a torn-down app.
  • New Lifecycle.isClosed / isClosing getters + regression tests (after-close and during-close).

2. test(egg-bin): Windows CI timeouts

Test bin (windows-latest, 24) flakily timed out (every failure was Test timed out in 60000ms, no assertion failures). The egg-bin command tests spawn child egg-bin processes (vitest-in-vitest) that run 50s+ each on Windows; under full parallelism + coverage they tip over the 60s timeout. This reproduces on plain next (run 27863137812), independent of fix #1.

Mirror the root config's Windows handling in tools/egg-bin/vitest.config.ts: on Windows CI, raise testTimeout/hookTimeout to 120s and cap maxWorkers to 2 so fewer child-process trees compete. No effect off Windows.

Test evidence

  • packages/core full suite (incl. new lifecycle regression tests) — 476 pass; typecheck + lint + format clean.
  • plugins/session/.../session.test.ts — pass.
  • tools/egg-bin/test/commands/dev.test.ts — pass locally (≈43s for one file, confirming the subprocess slowness).
  • CI after fix Opensource version of egg-bin #1: the main Test matrix incl. Test (windows-latest, 22/24) passed — only the pre-existing egg-bin Windows timeout remained, which fix egg-mock #2 targets.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added lifecycle state visibility via isClosed and isClosing; registerBeforeClose now returns success/failure.
  • Bug Fixes
    • Prevent late close-hook registration during in-flight/finished shutdown, avoiding stranded hooks.
    • Improved load/close race safety by short-circuiting teardown when close-hook registration can’t be accepted.
  • Tests
    • Added coverage for close-in-flight and post-close hook registration races.
  • Documentation
    • Updated guidance for vitest isolate:false state leak scenarios.
  • Chores
    • Tuned Vitest timeouts/worker limits for Windows CI and split a slow CLI test to reduce flakiness.

Windows CI flakily failed `@eggjs/session` `session.test.ts` with
"app has been closed" / "Can't find viewEngine" unhandled rejections.

Root cause: `mm.app()` loads an app and an agent whose `load()` runs as a
`registerBeforeStart` hook on `process.nextTick`, so it can still be in flight
when `close()` runs (slow Windows fs widens the window). `Lifecycle.close()`
does not wait for the in-flight load, so it sets `#isClosed = true` while
`load()` is still running. The loading code then calls
`Lifecycle.registerBeforeClose()` — directly in `egg.ts` `load()`, or lazily
via `coreLogger` -> `createLoggers()` reached from `dumpTiming` /
`_unhandledRejectionHandler` — which `assert(#isClosed === false)` turned into a
thrown "app has been closed". That throw became a process-level unhandled
rejection, which under vitest `isolate: false` is attributed to whatever test
file is currently running, failing an unrelated file.

Fix:
- `Lifecycle.registerBeforeClose()` now skips (no-op + debug log) when the
  lifecycle is already closed instead of asserting/throwing — a hook registered
  after close would never fire anyway.
- `EggApplicationCore.load()` short-circuits when `lifecycle.isClosed` is
  already true: it removes the `unhandledRejection` listener it just added (so
  it does not leak across files) and returns without loading a torn-down app.
- Add a `Lifecycle.isClosed` getter and a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 20, 2026 07:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a3bc895b-5683-4cf3-9889-d96099d7df38

📥 Commits

Reviewing files that changed from the base of the PR and between 6d94be5 and 85cbdf6.

📒 Files selected for processing (1)
  • packages/egg/src/lib/egg.ts

📝 Walkthrough

Walkthrough

Adds isClosing and isClosed getters to expose Lifecycle state. Changes registerBeforeClose to return false instead of throwing when the lifecycle is already closing or closed. Sets #isClosing = true at the start of close() to refuse late hook registrations. Adds regression tests for post-close and during-close registration races. Guards EggApplicationCore to clean up resources immediately when registerBeforeClose fails and refactors timeout management. Updates wiki documentation describing the teardown/load race fix. Configures vitest Windows CI test timeouts, worker limits, and splits a slow sequential test to reduce contention.

Changes

Lifecycle teardown race fix and test hardening

Layer / File(s) Summary
Lifecycle state flags and getters
packages/core/src/lifecycle.ts
Introduces a private #isClosing flag, initializes it to false in the constructor, and adds public isClosed and isClosing getters to expose the internal state.
registerBeforeClose race guard and close() timing
packages/core/src/lifecycle.ts
Changes registerBeforeClose to return boolean: returns false (and logs) when #isClosing or #isClosed is already true, else registers the hook and returns true. Also sets #isClosing = true at the start of close() before snapshotting hooks, ensuring registrations during in-flight close are refused.
Regression tests
packages/core/test/lifecycle.test.ts
Adds two async test cases: one verifying registerBeforeClose returns false after close() completes without throwing, and one verifying registration during an in-flight close() is refused and does not leave a stranded hook.
EggApplicationCore teardown on registration failure
packages/egg/src/lib/egg.ts
Captures the return value of registerBeforeClose(). When registration fails, immediately removes the unhandledRejection listener, closes the messenger, closes any lazily-created loggers, clears the start-timeout timer, and returns early to prevent resource leaks in shutdown races. Refactors timeout management to use an instance field for cleaner teardown.
Wiki documentation
wiki/concepts/vitest-isolate-false-state-leaks.md, wiki/index.md, wiki/log.md
Extends source files list in the concept doc, adds root cause entry 4 describing the teardown/in-flight load race and the implemented fixes, updates the index bullet text, and adds a 2026-06-20 log entry.
Windows CI test contention reduction
tools/egg-bin/vitest.config.ts, tools/egg-bin/test/my-egg-bin.test.ts
Detects Windows CI environment and adjusts testTimeout, hookTimeout to 120000 ms, and sets maxWorkers: 2 only on Windows CI. Splits a combined nsp test into two separate async test cases with comments explaining per-fork isolation to prevent slow startup timeouts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • eggjs/egg#5904: Modifies packages/core/src/lifecycle.ts close hook execution control flow, overlapping with the same lifecycle close and hook handling logic changed in this PR.

Suggested reviewers

  • jerryliang64
  • gxkl
  • akitaSummer

Poem

🐇 A rabbit checks the closing door—
Is it locked? Did we close before?
Late arrivals get "false" returned,
No stranded hooks or listeners burned.
Windows timers get more breathing room,
Teardown now is safe and soon! 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically summarizes the two main changes: fixing Windows-flaky tests caused by lifecycle teardown race and egg-bin timeouts.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/windows-flaky-teardown-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 20, 2026

Copy link
Copy Markdown

Deploying egg with  Cloudflare Pages  Cloudflare Pages

Latest commit: 85cbdf6
Status: ✅  Deploy successful!
Preview URL: https://b2a99b74.egg-cci.pages.dev
Branch Preview URL: https://fix-windows-flaky-teardown-r.egg-cci.pages.dev

View logs

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a teardown and in-flight load race condition that causes flaky test failures under Vitest with isolate: false. It introduces an isClosed getter to Lifecycle, skips registering close hooks if the lifecycle is already closed instead of throwing an error, and short-circuits the application loading process early if the lifecycle has closed. The review feedback highlights that when short-circuiting the load process early, other initialized resources like messenger and loggers should also be cleaned up to prevent resource and memory leaks.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/egg/src/lib/egg.ts Outdated
Comment on lines +251 to +254
if (this.lifecycle.isClosed) {
process.removeListener('unhandledRejection', this._unhandledRejectionHandler);
return;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When short-circuiting load() early due to a teardown race, we must also clean up other resources initialized in the constructor or during the early load phase (such as this.messenger and any lazily created this.#loggers). Otherwise, their process-level event listeners (IPC) and file descriptors will leak, causing state/memory leaks under isolate: false.

    if (this.lifecycle.isClosed) {
      process.removeListener('unhandledRejection', this._unhandledRejectionHandler);
      this.messenger.close();
      if (this.#loggers) {
        for (const logger of this.#loggers.values()) {
          logger.close();
        }
      }
      return;
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in f0af5ef. The early-return path now closes the messenger (IPC listeners) and any lazily-created loggers (#loggers, file descriptors) in addition to removing the unhandledRejection listener, so nothing leaks across files under isolate: false.

@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.57%. Comparing base (0333c73) to head (85cbdf6).
⚠️ Report is 1 commits behind head on next.

Additional details and impacted files
@@            Coverage Diff             @@
##             next    #5978      +/-   ##
==========================================
+ Coverage   85.50%   85.57%   +0.06%     
==========================================
  Files         669      669              
  Lines       19825    19849      +24     
  Branches     3917     3923       +6     
==========================================
+ Hits        16952    16986      +34     
+ Misses       2481     2474       -7     
+ Partials      392      389       -3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 20, 2026

Copy link
Copy Markdown

Deploying egg-v3 with  Cloudflare Pages  Cloudflare Pages

Latest commit: 85cbdf6
Status: ✅  Deploy successful!
Preview URL: https://3b0194fb.egg-v3.pages.dev
Branch Preview URL: https://fix-windows-flaky-teardown-r.egg-v3.pages.dev

View logs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/core/src/lifecycle.ts`:
- Around line 256-266: The guard in the registerBeforeClose() method only checks
if the app has already closed (`#isClosed`), but it does not guard against hooks
being registered while a close operation is in-progress. When close() starts but
has not yet finished, registerBeforeClose() can still accept and store hooks
after the close callback snapshot is taken, causing those hooks to never execute
and leaving teardown cleanup incomplete. Add an additional check for a flag that
tracks whether the close operation is currently in-progress (not just whether it
has finished) alongside the existing `#isClosed` check to prevent hooks from being
registered during the entire close lifecycle.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4df2e1a5-84c0-4dd4-bb4c-6dd4934deb3a

📥 Commits

Reviewing files that changed from the base of the PR and between 34d8732 and 8b68070.

📒 Files selected for processing (6)
  • packages/core/src/lifecycle.ts
  • packages/core/test/lifecycle.test.ts
  • packages/egg/src/lib/egg.ts
  • wiki/concepts/vitest-isolate-false-state-leaks.md
  • wiki/index.md
  • wiki/log.md

Comment thread packages/core/src/lifecycle.ts Outdated
…ed load

Address review feedback on the teardown-race fix:

- `Lifecycle.registerBeforeClose()` now also guards against close *in progress*
  (new `#isClosing` flag set at the top of `close()`, before the close-callback
  snapshot is taken), not just close *finished*. A hook registered mid-close
  would be added after the snapshot and never run — it is now refused. The
  method returns a boolean indicating whether the hook was registered, plus a
  new `Lifecycle.isClosing` getter. (per CodeRabbit)

- `EggApplicationCore.load()` checks that return value: when registration is
  refused by a teardown race, it cleans up every resource it already created —
  the `unhandledRejection` listener, the messenger (IPC listeners) and any
  lazily-created loggers (file descriptors) — so none leak across files under
  vitest isolate:false, then returns without loading a torn-down app. (per
  Gemini)

- Extend the regression test with a close-in-progress case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/core/src/lifecycle.ts (1)

294-319: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset #isClosing when close() exits.

Line 298 sets #isClosing = true, but close() never flips it back to false. That makes isClosing remain true after completion (or after a thrown close hook), which breaks the getter’s “currently running” contract.

Suggested fix
  async close(): Promise<void> {
    // Mark closing before the close-callback snapshot is taken, so any hook
    // registered while close() is in flight (e.g. an in-flight load racing
    // teardown) is refused by registerBeforeClose() instead of being stranded.
    this.#isClosing = true;
-   if (this.#metadataOnly || this.#snapshotBuilding) {
-     debug('%s skip beforeClose functions in early-exit lifecycle mode', this.app.type);
-     this.#closeFunctionSet.clear();
-   } else {
-     // close in reverse order: first created, last closed
-     const closeFns = Array.from(this.#closeFunctionSet);
-     debug('%s start trigger %d beforeClose functions', this.app.type, closeFns.length);
-     for (const fn of closeFns.reverse()) {
-       debug('%s trigger beforeClose at %o', this.app.type, fn.fullPath);
-       await utils.callFn(fn);
-       this.#closeFunctionSet.delete(fn);
-     }
-   }
-
-   // Be called after other close callbacks
-   this.app.emit('close');
-   this.removeAllListeners();
-   this.app.removeAllListeners();
-   this.#isClosed = true;
-   debug('%s closed', this.app.type);
+   try {
+     if (this.#metadataOnly || this.#snapshotBuilding) {
+       debug('%s skip beforeClose functions in early-exit lifecycle mode', this.app.type);
+       this.#closeFunctionSet.clear();
+     } else {
+       // close in reverse order: first created, last closed
+       const closeFns = Array.from(this.#closeFunctionSet);
+       debug('%s start trigger %d beforeClose functions', this.app.type, closeFns.length);
+       for (const fn of closeFns.reverse()) {
+         debug('%s trigger beforeClose at %o', this.app.type, fn.fullPath);
+         await utils.callFn(fn);
+         this.#closeFunctionSet.delete(fn);
+       }
+     }
+
+     // Be called after other close callbacks
+     this.app.emit('close');
+     this.removeAllListeners();
+     this.app.removeAllListeners();
+     this.#isClosed = true;
+     debug('%s closed', this.app.type);
+   } finally {
+     this.#isClosing = false;
+   }
  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/core/src/lifecycle.ts` around lines 294 - 319, The close() method
sets this.#isClosing = true at the start but never resets it to false upon
completion or error, causing the flag to remain true after the method exits and
breaking the getter's contract that it represents whether close is currently
running. Add code to reset this.#isClosing = false when the close() method exits
by either placing it at the end of the method after this.#isClosed = true is
set, or preferably wrapping the method logic in a try-finally block to ensure
the flag is reset even if an error is thrown during close hook execution.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/egg/src/lib/egg.ts`:
- Around line 267-284: The if (!registered) early-return block cleans up
resources but fails to clear the startup timeout timer that was started in
`#setupTimeoutTimer`(), allowing it to fire after the app is torn down. Add a
private property to track the timeout ID returned by `#setupTimeoutTimer`(),
create a new `#clearStartTimeoutTimer`() method to safely clear this timeout,
modify `#setupTimeoutTimer`() to store the timeout ID and call
`#clearStartTimeoutTimer`() when the app is ready, and finally call
`#clearStartTimeoutTimer`() within the if (!registered) block before returning to
prevent the timer from firing on the torn-down app.

---

Outside diff comments:
In `@packages/core/src/lifecycle.ts`:
- Around line 294-319: The close() method sets this.#isClosing = true at the
start but never resets it to false upon completion or error, causing the flag to
remain true after the method exits and breaking the getter's contract that it
represents whether close is currently running. Add code to reset this.#isClosing
= false when the close() method exits by either placing it at the end of the
method after this.#isClosed = true is set, or preferably wrapping the method
logic in a try-finally block to ensure the flag is reset even if an error is
thrown during close hook execution.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24b3a0cd-ecf8-4220-ba81-c53cb843fa58

📥 Commits

Reviewing files that changed from the base of the PR and between 8b68070 and f0af5ef.

📒 Files selected for processing (4)
  • packages/core/src/lifecycle.ts
  • packages/core/test/lifecycle.test.ts
  • packages/egg/src/lib/egg.ts
  • wiki/concepts/vitest-isolate-false-state-leaks.md
✅ Files skipped from review due to trivial changes (1)
  • wiki/concepts/vitest-isolate-false-state-leaks.md

Comment thread packages/egg/src/lib/egg.ts
The egg-bin command tests spawn child `egg-bin` processes (vitest-in-vitest),
which are slow on Windows CI. With full parallelism and a flat 60s timeout, the
`Test bin (windows-latest, 24)` job flakily timed out — individual cases in
`cov.test.ts` / `test.test.ts` / `dev.test.ts` routinely ran 50s+ and tipped
over 60s under child-process contention. This flake reproduces on plain `next`
(e.g. run 27863137812), independent of the lifecycle fix in this PR.

Mirror the root config's Windows handling in `tools/egg-bin/vitest.config.ts`:
on Windows CI, raise `testTimeout`/`hookTimeout` to 120s and cap `maxWorkers` to
2 so fewer child-process trees compete. No effect off Windows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 20, 2026 12:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@killagu killagu changed the title fix(core): tolerate close-hook registration after close (Windows-flaky session test) fix: stabilize Windows-flaky tests (lifecycle teardown race + egg-bin timeouts) Jun 20, 2026
killagu and others added 2 commits June 20, 2026 21:03
`should my-egg-bin nsp success` forked three `egg-bin` subprocesses
sequentially in one test; at ~50s per ts-node/esm spawn on Windows CI that
summed past even the raised 120s timeout. `should show help` similarly chained
two forks. Split both so each test runs a single fork and stays well under the
timeout, removing the last `Test bin (windows-latest, 24)` flake. No behavior
change off Windows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per review (CodeRabbit): the `!registered` early-return in load() cleaned up the
unhandledRejection listener, messenger and loggers, but left the startup-timeout
timer from #setupTimeoutTimer() running. On a torn-down app it could still fire
~workerStartTimeout later and re-trigger dump/log side effects (and keep the
event loop alive). Track the timer in a `#startTimeoutTimer` field, add
`#clearStartTimeoutTimer()`, clear it on ready as before, and also clear it in
the early-return path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 20, 2026 13:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@killagu killagu merged commit 7b062db into next Jun 20, 2026
26 checks passed
@killagu killagu deleted the fix/windows-flaky-teardown-race branch June 20, 2026 13:39
killagu added a commit that referenced this pull request Jun 20, 2026
`Test (ubuntu-latest, 24)` flakily failed with `EnvironmentTeardownError:
Cannot load '<file>' ... after the environment was torn down. This is not a
bug in Vitest.` (e.g. loading `duplicate-proto-name-app`'s `AppController.ts`,
or `logrotator`'s `rotate_by_file.ts`).

## Root cause

The framework loads files via dynamic `import()` through
`RealLoaderFS.loadFile`. Tracing the dangling load shows it is a *legitimate,
fully-awaited boot-time load* — e.g. the schedule plugin loading schedule
files in `configDidLoad` (`loadFile <- getExports <- ScheduleLoader <-
loadSchedule <- Scheduler.init/ScheduleWorker.init <- configDidLoad`). There
is no fire-and-forget async leak.

It outlives teardown because of the suite's `isolate: false` (threads) config:
an `import()` issued while constructing an app in one test file's module-runner
environment can have follow-up settling (tsx transform / transitive resolution)
that completes after vitest tears that environment down to move to the next
file. vitest itself labels this "not a bug in Vitest". `loadFile` wrapped and
rethrew it, so it became an unhandled rejection that, under `isolate: false`,
is attributed to whatever unrelated test file is running — failing it at
random. Same class of teardown race as #5978, different symptom.

## Fix

Since the load is correct and awaited, there is nothing to fix at the source;
the only artifact is the benign vitest-internal error. Detect it in `loadFile`
(by `name === 'EnvironmentTeardownError'` and/or the "after the environment was
torn down" message, walking the `cause` chain) and resolve `undefined` instead
of throwing. Loaders already treat an empty module as "no exports", so this is
a safe no-op; genuine load-time failures still propagate unchanged.

The detection is gated on `process.env.VITEST`, so it is unmistakably a
test-runner-only accommodation that can never alter production loader behavior.

Verified against the `@eggjs/mock` suite: on `next` it fails with this
teardown error; with the fix that error is gone. Adds regression coverage for
both the swallow and the still-throws-on-real-error paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
killagu added a commit that referenced this pull request Jun 20, 2026
`Test (ubuntu-latest, 24)` flakily failed with `EnvironmentTeardownError:
Cannot load '<file>' ... after the environment was torn down. This is not a
bug in Vitest.` (e.g. loading `duplicate-proto-name-app`'s `AppController.ts`,
or `logrotator`'s `rotate_by_file.ts`).

## Root cause

The framework loads files via dynamic `import()` through
`RealLoaderFS.loadFile`. Tracing the dangling load shows it is a *legitimate,
fully-awaited boot-time load* — e.g. the schedule plugin loading schedule
files in `configDidLoad` (`loadFile <- getExports <- ScheduleLoader <-
loadSchedule <- Scheduler.init/ScheduleWorker.init <- configDidLoad`). There
is no fire-and-forget async leak.

It outlives teardown because of the suite's `isolate: false` (threads) config:
an `import()` issued while constructing an app in one test file's module-runner
environment can have follow-up settling (tsx transform / transitive resolution)
that completes after vitest tears that environment down to move to the next
file. vitest itself labels this "not a bug in Vitest". `loadFile` wrapped and
rethrew it, so it became an unhandled rejection that, under `isolate: false`,
is attributed to whatever unrelated test file is running — failing it at
random. Same class of teardown race as #5978, different symptom.

## Fix

Since the load is correct and awaited, there is nothing to fix at the source;
the only artifact is the benign vitest-internal error. Detect it in `loadFile`
(by `name === 'EnvironmentTeardownError'` and/or the "after the environment was
torn down" message, walking the `cause` chain) and resolve `undefined` instead
of throwing. Loaders already treat an empty module as "no exports", so this is
a safe no-op; genuine load-time failures still propagate unchanged.

The detection is gated on `process.env.VITEST`, so it is unmistakably a
test-runner-only accommodation that can never alter production loader behavior.

Verified against the `@eggjs/mock` suite: on `next` it fails with this
teardown error; with the fix that error is gone. Adds regression coverage for
both the swallow and the still-throws-on-real-error paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
killagu added a commit that referenced this pull request Jun 21, 2026
## Motivation

`Test (ubuntu-latest, 24)` flakily fails with:

```
EnvironmentTeardownError: Cannot load '<file>' ... after the environment was torn down. This is not a bug in Vitest.
```

Observed in CI loading `tegg/plugin/controller`'s
`duplicate-proto-name-app/.../AppController.ts`, and reproduced locally
in the `@eggjs/mock` suite loading `logrotator`'s `rotate_by_file.ts` —
failing unrelated tests (e.g. `mock_httpclient_next` "should auto
restore after each case").

## Root cause (traced)

The framework loads files via dynamic `import()` through
`RealLoaderFS.loadFile`. Instrumenting the dangling load shows it is a
**legitimate, fully-awaited boot-time load** — e.g. the schedule plugin
loading schedule files during `configDidLoad`:

```
RealLoaderFS.loadFile  ← getExports  ← ScheduleLoader.parse/.load
  ← loadSchedule  ← Scheduler.init (agent) / ScheduleWorker.init (app)  ← Boot.configDidLoad
```

Every link awaits: `loadSchedule` awaits the loader, `configDidLoad`
awaits `loadSchedule`, `ready()` awaits `configDidLoad`. Timers /
`serverDidReady` / strategy `start()` do **not** import. So there is
**no fire-and-forget async leak** to fix at the source.

It outlives teardown because of the suite's `isolate: false` (threads)
config: an `import()` issued while constructing an app in one test
file's module-runner environment can have follow-up settling (tsx
transform / transitive resolution) that completes *after* vitest tears
that environment down to move to the next file. vitest itself labels
this `"This is not a bug in Vitest"`. `loadFile` wrapped and rethrew it,
so it became an **unhandled rejection** that, under `isolate: false`,
gets blamed on whatever unrelated test file is running. Same class of
teardown race addressed for a different symptom in #5978.

## Fix

Because the load is correct and awaited, there's nothing to fix at the
source — the only artifact is the benign vitest-internal error. Detect
it in `loadFile` (by `name === 'EnvironmentTeardownError'` and/or the
`"after the environment was torn down"` message, walking the `cause`
chain) and resolve `undefined` instead of throwing.

- **Gated on `process.env.VITEST`** so it is unmistakably a
test-runner-only accommodation that can never alter production loader
behavior.
- Loaders already treat an empty module as "no exports", so this is a
safe no-op.
- **Genuine load-time failures still propagate unchanged.**

## Test evidence

- **Empirical:** on `next`, `pnpm vitest run plugins/mock/test` fails
with the `EnvironmentTeardownError`; with this fix that error is gone.
(A separate, pre-existing `mock error` cross-file leak from
`app.test.ts`/`agent.test.ts` remains — different root cause, out of
scope; it was previously masked by the teardown error.)
- **Unit:** new `loader-fs` regression tests assert `loadFile` resolves
`undefined` for a teardown-style import error and still throws (wrapped)
for a genuine load failure. Full `@eggjs/loader-fs` suite: 4 passed.
- `oxlint --type-aware`, `oxfmt --check`, and `tsgo --noEmit` clean on
the changed files.

> Note: the same `import()`-races-teardown pattern also exists in
`@eggjs/tegg-loader`'s `LoaderUtil.loadFile`; observed failures all
flowed through `@eggjs/loader-fs`, so this PR fixes that path. The tegg
path can get the same guard as a follow-up if it surfaces.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved module-loading error handling during test runs: if a module
import occurs after the test environment has been torn down, the loader
now treats it as a benign no-op and resolves `undefined` instead of
failing.
* True load-time failures are still reported as before with the expected
loader-fs error message.
* **Tests**
* Added new fixtures and Vitest coverage to verify teardown-related
import races are handled gracefully, while genuine load errors still
reject.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

egg-mock Opensource version of egg-bin

2 participants