feat: cache iOS runner artifacts during prepare by thymikee · Pull Request #688 · callstack/agent-device

thymikee · 2026-06-04T00:16:51Z

Summary

prepare ios-runner now owns Apple runner artifact restore/build/health behavior: cache hits launch from the restored .xctestrun, misses build through prepare, and bad restored artifacts are marked, cleaned, rebuilt once, and health-checked again.

The runner path is now organized around an Apple runner runtime/provider contract with runCommand, optional prepare, and optional best-effort prewarm. Local XCUITest execution implements that runtime, while provider-backed integration scenarios can own prepare directly without reducing it to a raw uptime command.

Apple platform policy moved out of the xctestrun artifact module into apple-runner-platform.ts, so SDK names, derived-cache folders, xctestrun hints, and destination strings are table-driven for iOS, tvOS targets, and macOS. That keeps future iPad/watchOS-style expansion additive instead of spreading conditionals through artifact/cache code.

The iOS CI and nightly replay workflows dogfood this path by restoring/saving runner cache and running prepare ios-runner --timeout 300000 --json as the single build-or-health gate.

Touched files: 23. Scope expanded from runner runtime to CI workflow/action wiring so this repo uses the new prepare-owned cache path.

Validation

pnpm check:quick passed after the final refactor.

Focused provider/runner coverage passed: pnpm exec vitest run src/platforms/ios/__tests__/runner-provider.test.ts src/platforms/ios/__tests__/runner-client.test.ts src/platforms/ios/__tests__/runner-command-retry.test.ts test/integration/provider-scenarios/macos-desktop.test.ts test/integration/provider-scenarios/ios-lifecycle.test.ts.

pnpm build:xcuitest passed for iOS and macOS runner builds.

pnpm check:unit passed outside the sandbox. The sandboxed attempt failed on unrelated local listener/process restrictions such as listen EPERM, so it was rerun with escalation.

pnpm format was run; unrelated existing formatter churn was reverted so this diff stays scoped.

Earlier PR validation also covered fallow, provider integration, smoke, and coverage for the cache/prepare implementation.

github-actions · 2026-06-04T00:18:26Z

Size Report

Metric	Base	Current	Diff
JS raw	1.1 MB	1.1 MB	+5.7 kB
JS gzip	368.6 kB	370.3 kB	+1.7 kB
npm tarball	474.6 kB	476.7 kB	+2.1 kB
npm unpacked	1.6 MB	1.6 MB	+5.7 kB

Startup median (7 runs, lower is better):

Scenario	Base	Current	Diff
CLI --version	25.6 ms	25.4 ms	-0.2 ms
CLI --help	41.3 ms	40.9 ms	-0.4 ms

Top changed chunks:

Chunk	Raw diff	Gzip diff
`dist/src/2415.js`	+5.5 kB	+1.6 kB
`dist/src/session.js`	+126 B	+53 B
`dist/src/9542.js`	+25 B	+10 B
`dist/src/args.js`	+2 B	+2 B
`dist/src/1352.js`	+14 B	+2 B

thymikee

Review: prepare-owned Apple runner cache

Reviewed for regressions, perf, data-flow, platform parity, and API usage. This is a strong refactor — sharing the high-signal points.

What works well

Cleaner data flow / decomposition. runner-client.ts shrinks from ~611 lines to a thin facade; orchestration moves into runner-lifecycle.ts, transport-error recovery into runner-command-recovery.ts, and platform policy into apple-runner-platform.ts. The AppleRunnerProvider contract (runCommand + optional prepare/prewarm) is a genuinely simpler seam — provider-backed scenarios can own prepare instead of faking an uptime command.
Platform parity. The table-driven RUNNER_PLATFORM_PROFILES (iOS/tvOS/macOS) plus the isApplePlatform gate in the daemon prepare handler makes future iPad/watchOS-style additions additive instead of conditional sprawl. The moved resolveRunner* helpers look behavior-equivalent to the originals.
Correct replay safety. The recovery path correctly distinguishes before-send failures (Runner did not accept connection, readiness-preflight) — which it safely replays on a fresh session — from after-send transport loss, which goes through status-probe recovery and never blindly replays a mutating command. That distinction is the crux of correctness here and it's preserved.
Cache correctness win. Folding Xcode/SDK fingerprints into the cache metadata means cached .xctestrun artifacts now invalidate across toolchain upgrades — a real bug class avoided.

Worth addressing

failureReason semantics on a successful recovery (highest signal). In prepareLocalIosRunner, the bad-cache→rebuild→health-check-passes path returns a result that still carries failureReason: reason, recordPrepareResult then logs it at warn, and the daemon handler spreads ...result into the --json response. So a prepare that ultimately succeeded surfaces a failureReason and a warn-level log. CI/agents parsing the JSON could read that as a failure. Consider renaming to recoveredFromReason/recoveryReason (or omitting it on success) to keep "prepare succeeded" unambiguous.
resolveExpectedRunnerCacheMetadata now spawns subprocesses. It shells out to xcodebuild -version and two xcrun calls. It's memoized per-process via appleToolFingerprintCache, which is the right call — but it converts a previously-pure metadata function into one doing blocking subprocess I/O on first use. Worth confirming this is never invoked on a latency-sensitive per-command path (it looks build/cache-key scoped, which is fine).
Schema version not bumped. RUNNER_CACHE_SCHEMA_VERSION stays 1 despite five new metadata fields. It still works (full-JSON comparableRunnerCacheMetadata equality + the hash-derived cache key both naturally invalidate stale entries), but bumping it would make the invalidation intent explicit and avoid confusing "same schemaVersion, different shape" metadata in logs.
Order-sensitive metadata equality duplicated across two files. Reuse hinges on JSON.stringify(comparableRunnerCacheMetadata(...)) equality, and the derived cache key is a hash of the same. The CI .mjs (write-xcuitest-cache-metadata.mjs) must emit byte-identical key order, and this PR widens that implicit contract by 5 fields across both files. They currently match, but a sorted-key canonicalization would make this robust against future reordering.
Vestigial clean-derived: "1". With build-on-miss: "false", that input now only feeds the skipped "Build replay artifacts" step (env is scoped there, not job-level), so it's effectively dead in ios.yml/replays-nightly.yml. Dropping it would avoid implying the prepare step cleans the restored cache (it correctly doesn't — AGENT_DEVICE_IOS_RUNNER_DERIVED_PATH is job-level and AGENT_DEVICE_IOS_CLEAN_DERIVED is not, so the restore→reuse flow is intact).

Nothing here looks blocking; the failureReason-on-success naming is the one I'd most want resolved before merge.

Generated by Claude Code

thymikee · 2026-06-04T21:49:47Z

Addressed the review feedback in 08dbb4a26:

Successful bad-cache recovery now returns/logs recoveryReason instead of failureReason, and apple_runner_prepare stays info level. Actual failed recovery still uses failureReason.
Bumped runner artifact cache metadata schema to 2 in both agent-device and the CI metadata writer.
Made runner cache metadata comparison/cache-key hashing use stable-key JSON so field ordering is no longer an implicit contract.
Removed dead clean-derived: "1" from the build-on-miss: "false" iOS CI/nightly setup calls.

Validation:

pnpm exec vitest run src/platforms/ios/__tests__/runner-command-retry.test.ts src/platforms/ios/__tests__/runner-xctestrun.test.ts src/platforms/ios/__tests__/runner-client.test.ts passed.
pnpm check:quick passed.
pnpm check:unit was rerun outside the sandbox; it hit existing timing/tooling flakes in unrelated daemon-client/iOS status-bar/Android bundletool tests, while the focused runner/cache tests above passed.

github-actions · 2026-06-04T22:37:21Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-04 22:37 UTC

thymikee force-pushed the codex/ios-runner-prepare-cache-ci branch 4 times, most recently from 6e6a593 to 73713ab Compare June 4, 2026 01:21

thymikee commented Jun 4, 2026

View reviewed changes

thymikee added 6 commits June 4, 2026 15:17

feat: cache ios runner artifacts during prepare

c996571

refactor: deepen ios runner lifecycle

80cd41a

refactor: generalize apple runner prepare

aa97d84

refactor: simplify apple runner lifecycle

0893d71

fix: clarify runner prepare recovery diagnostics

08aef44

refactor: trim apple runner provider surface

5807622

thymikee force-pushed the codex/ios-runner-prepare-cache-ci branch from 952f43e to 5807622 Compare June 4, 2026 22:20

test: simplify runner recovery diagnostics assertion

e4df13d

thymikee merged commit ad7b386 into main Jun 4, 2026
18 checks passed

thymikee deleted the codex/ios-runner-prepare-cache-ci branch June 4, 2026 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cache iOS runner artifacts during prepare#688

feat: cache iOS runner artifacts during prepare#688
thymikee merged 7 commits into
mainfrom
codex/ios-runner-prepare-cache-ci

thymikee commented Jun 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

thymikee left a comment

Uh oh!

thymikee commented Jun 4, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thymikee commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Size Report

Uh oh!

thymikee left a comment

Choose a reason for hiding this comment

Review: prepare-owned Apple runner cache

What works well

Worth addressing

Uh oh!

thymikee commented Jun 4, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thymikee commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading