Skip to content

vscode + tower: detect installed-vs-running Tower version divergence; preflight needs an in-memory version probe, not just codev --version #983

@amrmelsayed

Description

@amrmelsayed

Problem

The v3.1.7 #791 CLI preflight verifies that the installed codev CLI is at a version at least as new as the VS Code extension. That check inspects the binary on disk via codev --version. It does not, and currently cannot, tell whether the running Tower process is executing that same code.

After an npm install -g @cluesmith/codev@<newer> (an upgrade), there are two distinct version surfaces:

Surface What it reports When it changes
Installed binary (codev --version) The version on disk The moment npm install -g completes
Running Tower process (in-memory) The version Tower was started with Only on Tower restart

The #791 preflight uses the first surface. The second surface is invisible to it. Result: a user can upgrade the CLI, see the preflight pass cleanly, and still hit unexpected issues because Tower's still serving handlers and wire shapes from the old in-memory code. The local-install workflow (pnpm -w run local-install) restarts Tower as its final step precisely to avoid this; an end-user npm install -g upgrade has no such automatic restart.

Failure mode

A handful of concrete shapes:

  • New extension features that depend on a new Tower API call get a 404 or a BAD_REQUEST from stale Tower because the route doesn't exist in the in-memory code yet.
  • Wire-type changes (a field added to OverviewBuilder, a renamed handler) silently produce undefined values in the extension because Tower's serialiser is still the old one.
  • Bug fixes shipped in the new version (e.g. v3.1.7's vscode: sidebar data (builders, backlog, PRs, etc.) intermittently disappears, then recovers on its own #916 last-known-good cache) don't apply until Tower restarts, even though the user thinks they upgraded.

In all three, the user has no signal that the running and installed versions diverge. The extension reports green; the actual user-visible behaviour is stale.

Proposed mechanism

A small in-memory version probe Tower exposes that the extension can call alongside the existing CLI preflight.

1. Tower endpoint

A read-only HTTP endpoint that returns the version of the currently running Tower process. Something like:

GET /api/version
→ 200 { "version": "3.1.7", "startedAt": "2026-06-04T...", "buildSha": "..." }

Implemented inside Tower's HTTP server so the value is whatever the process actually loaded, not the disk binary. Cheap, no auth needed beyond the existing workspace token.

2. Extension preflight extension

Right after the existing codev --version check resolves OK, fire a follow-up call to /api/version. Three outcomes:

  • OK (running version >= extension's expected version): no signal.
  • Tower running but stale (installed CLI is current, running Tower is older): explicit toast with a Restart Tower action that runs afx tower stop && afx tower start and re-runs the probe.
  • Tower not reachable: existing "not connected to Tower" path applies; no new signal needed.

The toast for the stale case is the new UX. The action button avoids the "go figure out what command to run" dead-end (same shape #982 just raised about the No-active-terminal warning).

3. Status row addition (optional)

The existing Codev CLI row in the Status view (from #791) could carry a second sub-row, or extend its tooltip, to surface running-Tower-version alongside installed-CLI-version. Lets the user see the divergence proactively rather than waiting for the preflight toast.

Design questions (plan-approval)

  1. Compare running Tower against what? The installed CLI version, the extension's expected version, or both? Probably both: running-Tower must be >= extension's expected (the user-facing compatibility rule) AND == installed-CLI (otherwise the user upgraded but didn't get the upgrade).
  2. Auto-restart vs prompt-only? A Restart Tower button on the toast is the lightest UX. Auto-restarting Tower on detected divergence is more aggressive; some users may have in-flight builder work they'd rather finish before a restart.
  3. What if Tower predates the new endpoint? Old Towers won't have /api/version. A 404 from the probe should be treated as "Tower too old to even report its version" and trigger the same Restart Tower toast (with possibly stronger wording).
  4. Frequency. Once at activation? Once every reconnect? On every overview tick? Probably activation + reconnect; per-tick is wasteful since the in-memory version doesn't change without a restart, and a restart will sever and re-establish the connection anyway.
  5. What about cross-machine? A user could be running the VS Code extension on one machine but tunnelling to a Tower on another. The probe still works (HTTP request goes through the tunnel), but the Restart Tower action targets the local Tower which may not be the right one. Plan-gate should pick: scope the action to local-only, or pick up the tunnel-aware target if there is one.

Acceptance (loose, pending plan-gate)

  • Tower exposes a /api/version (or equivalent) read-only endpoint returning its in-memory version.
  • The wire type for the response lives in @cluesmith/codev-types and is consumed by the extension's preflight.
  • When running-Tower version diverges from installed-CLI version (or from extension's expected version), the user gets an actionable toast naming the divergence and offering Restart Tower.
  • The action invokes afx tower stop && afx tower start (or equivalent), then re-probes to confirm.
  • No false positives in healthy state (running == installed == expected): no toast, no extra signal beyond what vscode: startup preflight — verify codev CLI installed and version ≥ extension version; guide install/upgrade otherwise #791 already shows.
  • Backward-compatible against pre-endpoint Towers: a 404 from the probe is interpreted as "Tower too old to report" and triggers the same restart prompt with appropriate wording.

Why cross-cutting

Touches three packages: @cluesmith/codev (the Tower HTTP server, new endpoint), @cluesmith/codev-types (the response wire type), and @cluesmith/codev-vscode (the new probe call + toast). Single area label would understate the coordination cost.

What this isn't

Related

  • #791 — CLI preflight on startup (shipped in v3.1.7). The thing this issue extends.
  • #982 — terminal session gone, no in-extension recovery path. Adjacent UX class: state-divergence with an unactionable signal.
  • scripts/local-install.sh — the local-test path that restarts Tower as its final step. Existence of this script is itself evidence the upgrade-without-restart bug is known internally; this issue surfaces the same fix to end users.

Metadata

Metadata

Assignees

Labels

area/cross-cuttingTouches multiple areas — needs coordinated handling

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions