feat(settings): add auto-detect button for Codex CLI path#47
Conversation
The app must locate the real `codex` binary; auto-discovery sometimes lands on a wrong/stale path and some users don't know how to set it by hand. Add an "Auto-detect" button to Settings -> Codex CLI path that force-rescans (ignoring the cached/override path) every common location plus PATH and verifies each candidate is actually runnable via `codex --version` (timeout-guarded, off the UI thread). A lone runnable hit is applied immediately; several open the picker dialog with the verified candidates; none falls back to manual entry -- and the empty case distinguishes "no codex installed" from "codex present but won't run". Backend: new `redetect_codex_cli_path` command (async/spawn_blocking) + `CodexPathResolver::redetect_runnable_paths` + shared `wait_child_with_timeout`; mac/win symmetric `probe_codex_runnable` + `redetect_runnable_codex_cli_paths`. Frontend (shared, serves mac+win): button, handler, en/zh strings. README (x2) + CHANGELOG updated. Hardening from local review: diagnostic logs on probe/wait failure, lone-hit set-rejection fallback to the dialog, candidate cap, and tests for wait_child_with_timeout (shared) + probe_codex_runnable (win, runs on the Linux CI job).
Reworks the auto-detect button per hands-on testing feedback (the button
is the small-user last-resort, so it must find codex broadly and, when
there are several, list the real ones to pick from):
- More comprehensive discovery: add login-shell resolution
(`$SHELL -lc 'command -v codex'`) so installs on the user's PATH
(nvm / asdf / brew / fnm) are found even under Finder's narrow launchd
PATH, while bypassing any `codex` shell function. Joined with the
managed-shim resolver, Codex.app bundle, fixed locations, and PATH walk.
- Confirm authenticity + capture version: each candidate is probed with
`codex --version` via a new shared `probe_version_with_timeout`
(poll + kill + stdout capture), replacing the bool-only
`wait_child_with_timeout`.
- Multiple installs -> list to pick: the dialog shows each verified
candidate as `path | version` under a "detected (verified runnable)"
heading with a "pick one" copy -- not the misleading "couldn't find
codex" copy that previously appeared even when two were found.
- One install -> applied directly (one-click fallback); none -> manual
dialog, distinguishing "broken install" from "not installed".
CodexCliCandidate{path, version} replaces bare path strings end to end
(model, trait, mac/win impls, TS types, dialog rendering). Tests updated:
probe_version_with_timeout (capture / non-zero / overrun), redetect
multi/empty, win probe_codex_version.
Local silent-failure review of the auto-detect rework flagged a CRITICAL: - discover_codex_via_login_shell ran an UNBOUNDED `$SHELL -lc` — a slow or hung login profile (nvm / asdf / brew shellenv / network-y rc files) would wedge the whole scan: button stuck on "Detecting...", a leaked blocking-pool thread, and no diagnostic. It runs first and synchronously, so one wedged profile froze the whole feature. It now goes through a shared, timeout-bounded helper (8s), matching the per-candidate probes' own stated design intent. - Extracted `run_capturing_stdout_with_timeout` (reader thread + poll + kill) out of `probe_version_with_timeout` so a child that writes more than the ~64KB pipe buffer before exiting can't backpressure its own pipe and deadlock until the timeout (MEDIUM). The version probe is now a thin wrapper taking the first stdout line; the login-shell resolve reuses the helper and takes the LAST line (skipping any profile banner). - spawn errors are now logged instead of being dropped by `.ok()?` (LOW). Adds a regression test that >64KB of stdout is drained, not deadlocked.
|
更新(commit 380e98d): 按反馈把「自动检测」做成给小白的全面兜底:
复审发现并已修一个 CRITICAL:login shell 之前用无超时的 本地全绿:cargo test 104 passed、tsc clean、clippy 无新 warning。新 .app 已重新打包。 |
Addresses 3 chatgpt-codex-connector P2 threads on the auto-detect PR: - run_capturing_stdout_with_timeout: the stdout reader was join()'d unbounded after the child exited. If the child exits but leaves a background process holding the stdout pipe, read_to_string never sees EOF and join() blocks forever, defeating the timeout. The reader now posts to a channel and we wait with a bounded recv_timeout, abandoning the detached reader after a short grace. - discover_real_codex_cli_from_shell: the managed-shim resolver script ran via unbounded `.output()`; it now goes through the same bounded helper as the login-shell resolve and the per-candidate probes, so no shell-touching spawn in the scan is unbounded anymore. (The login-shell `$SHELL -lc` thread was already bounded in 380e98d.)
| // Detected-mode opens after auto-detect found several runnable codex, | ||
| // so use a "pick one" copy — NOT the default "couldn't find it" copy | ||
| // that misled users into thinking detection had failed. | ||
| const hasDetected = detectedCandidates !== undefined && detectedCandidates.length > 0; | ||
| elements.codexCliDialogCopy.textContent = hasDetected | ||
| ? t(state.locale, "codexCliDetectPickCopy") | ||
| : t(state.locale, "codexCliDialogCopy"); |
There was a problem hiding this comment.
🟡 Periodic applyLocale() overwrites auto-detect dialog copy within 15 seconds of opening
When the auto-detect feature opens the dialog in "detected mode" (multiple candidates found), it sets custom dialog text at src-tauri/shared/front/actions.ts:761-763 ("pick one" copy) and a custom heading via renderCodexCliStatus at src-tauri/shared/front/actions.ts:699-701 ("Detected (verified runnable)"). However, rerenderDashboard() runs every 15 seconds (src-tauri/shared/front/actions.ts:1060-1062) and unconditionally calls applyLocale(), which resets elements.codexCliDialogCopy.textContent to the default "couldn't find codex" message (src-tauri/shared/front/render.ts:752) and resets elements.codexCliSuggestionsHeading to "Common locations" (src-tauri/shared/front/render.ts:756). The user sees the dialog text change from the correct "pick one" message to the misleading "couldn't find codex" message within at most 15 seconds of opening the dialog, despite the suggestion chips (the actual path buttons) remaining correct.
Prompt for agents
The bug is caused by the periodic `applyLocale()` (called from `rerenderDashboard()` every 15 seconds) unconditionally overwriting dialog text elements, even when those elements were set to contextual values by `openCodexCliDialog`.
Files involved:
- `src-tauri/shared/front/render.ts` lines 751-756 (`applyLocale` function): unconditionally sets `codexCliDialogCopy` and `codexCliSuggestionsHeading` to default i18n keys
- `src-tauri/shared/front/actions.ts` lines 757-763 and 698-701 (`openCodexCliDialog` and `renderCodexCliStatus`): set contextual values for detected mode
Possible approaches:
1. Have `applyLocale()` skip overwriting dialog copy/heading if the codex CLI dialog is currently open (check `elements.codexCliDialog.open`).
2. Store a flag (e.g. `state.codexCliDialogDetectedMode`) that `applyLocale()` checks before overwriting; clear it when the dialog closes.
3. Move the dialog text overwrite out of `applyLocale()` and into the dialog open/close logic exclusively.
Note: This same class of pre-existing issue also affects `submitCodexCliButton` text ("Save & retry login" reverts to "Save"), but it's most visible/confusing with the new "detected pick" vs "couldn't find" copy distinction.
Was this helpful? React with 👍 or 👎 to provide feedback.
What
Adds an Auto-detect button to Settings → "Codex CLI path", beside "Change".
The app needs the real
codexbinary to log in / launch Codex. Auto-discovery sometimes lands on a wrong or stale path, and some users don't know how to set it by hand — the existing dialog only offers manual entry + static "common location" hints.The new button force-rescans (ignoring the cached/override path) every common install location + PATH, and verifies each candidate is actually runnable via
codex --version(timeout-guarded, off the UI thread on the blocking pool):How
Backend (
src-tauri)redetect_codex_cli_path(async /spawn_blocking, since each probe spawns a child).CodexPathResolver::redetect_runnable_pathstrait method; sharedwait_child_with_timeout(poll-wait + kill, reaps on every branch so no zombies).probe_codex_runnable(codex --version, per-platform timeout: 3s mac / 5s win) +redetect_runnable_codex_cli_paths(candidate-capped).CodexCliRedetectResultmodel.Frontend (
src-tauri/shared/front/*, one edit serves both mac & win): button wiring,handleDetectCodexCli, en/zh strings; HTML rows (mac+win) + CSS.Local review hardening
Ran 3 local review agents (code-reviewer / silent-failure-hunter / pr-test-analyzer) before pushing and applied their findings:
io::Errorin the timeout helper).setCodexCliPathrejection → fallback to the dialog (prefilled) instead of a raw error toast.MAX_PROBE_CANDIDATES = 12) so a pathological PATH can't stall the scan.Tests / verification
cargo test --lib: 103 passed (incl. newwait_child_with_timeoutfast-exit / overrun + redetect empty / multiple).probe_codex_runnabletest placed in the win module so it runs on the Linux CIcargo test --libjob (the mac module isn't test-run in CI).tsc --noEmit: clean ·vite build: ok · clippy: new code clean.x86_64-pc-windows-msvccross-check is blocked byring's C-SDK requirement — an env limitation, not a code issue — so CI's Linux job is the real gate for the win branch).🤖 Generated with Claude Code