Skip to content

Central-config-driven client actions: daemon action reconciler + backfill-on-join (LLP 0041/0043)#166

Merged
philcunliffe merged 16 commits into
masterfrom
integration/central-config-client-actions
Jun 26, 2026
Merged

Central-config-driven client actions: daemon action reconciler + backfill-on-join (LLP 0041/0043)#166
philcunliffe merged 16 commits into
masterfrom
integration/central-config-client-actions

Conversation

@philcunliffe

@philcunliffe philcunliffe commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Implements the central-config-driven client action reconciler and its first instance, backfill-on-join.

Design: LLP 0041 (llp/0041-central-config-client-actions.design.md)
Plan: LLP 0043 (llp/0043-central-config-client-actions.plan.md)
Decisions/requests: LLP 0036 (seam) + LLP 0037 (backfill-on-join)

Tasks landed (each a verified --no-ff merge with a Task-Id: trailer):

  • T1 action reconciler core + client-actions.json marker store
  • T2 confirmation-edge onConfirmed hook in the config apply engine
  • T5 per-plugin backfill config validation (@hypaware/claude, @hypaware/codex)
  • T3 backfill action handler (backfillHandler over selectProviders + per-plugin config)
  • T6 clientActions status surface in the daemon status report
  • T4 daemon wiring (construct reconciler, wire onConfirmed, after-activation pass, single-flight guard)

Generated by neutral's implement-changeset wave loop; "done" re-derived from neutral ready (all 6 done, none blocked).

Change-Set: central-config-client-actions

philcunliffe and others added 14 commits June 25, 2026 15:17
…sign

Covers the two accepted-but-uncovered decisions LLP 0036 (the action
reconciler seam) and LLP 0037 (backfill-on-join, its first instance) with
one neutral-minted implementation design.

Specifies: the daemon-side action reconciler component and where it fires
in the existing lifecycle (the confirmPoll edge + an after-activation
already-confirmed pass), an onConfirmed hook on the apply engine, the
run-once completion marker (config-control/client-actions.json), the
per-plugin backfill config + window_days->--since resolution, subprocess
execution of `hyp backfill`, failure-surfaced-not-fatal status, and a
clientActions status section. Breaks the work into six independently-
mergeable task seams with a matching test strategy, and carries forward
the decisions' open questions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Refine the LLP 0041 implementation design into six independently-mergeable
tasks with explicit code dependencies, so neutral can schedule a parallel
first wave (reconciler core, confirmation-edge hook, per-plugin backfill
config validation) ahead of the handler, status surface, and daemon wiring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an optional `onConfirmed` callback to CreateConfigControlOptions and
invoke it from confirmPoll() exactly once on the probation active→cleared
transition (the early no-probation return guards every other poll). This
emits a precise confirmation edge the daemon can wire to schedule an
action-reconciler pass without polling configControl.status() each tick,
while keeping apply.js ignorant of the reconciler. Ships a no-op edge
until the daemon (T4) consumes it; existing apply tests are unaffected.

Adds a test asserting the hook fires on the active→cleared edge and not
on a no-probation poll. Annotates the edge with @ref LLP 0041.

Task-Id: T2
Add the generic, daemon-constructed client-action reconciler (LLP 0036 /
LLP 0041) and its marker store — the spine every other action task binds to.

- src/core/config/action_reconciler.js: createActionReconciler({ stateRoot,
  handlers, now, log }) -> { reconcile, readStatus }. reconcile() is
  level-triggered: per handler, diff desired() against the persisted markers
  and act only on the gap; a done marker short-circuits (run-once), a failed
  marker retries with bumped attempts, and reversible handlers undo a key the
  config no longer names. Atomic tmp+rename read/write (mode 0600) of
  config-control/client-actions.json, beside the apply engine's state.json.
  Standalone readClientActionStatus({ stateRoot }) for the status surface.
- src/core/config/types.d.ts: ActionHandler, ActionMarker, ActionMarkerStore,
  DesiredAction, ActionOutcome, ActionContext, ReconcileInput/Report,
  ClientActionStatus, ActionReconciler, CreateActionReconcilerOptions.
- test/core/action-reconciler.test.js: run-once idempotency + done
  short-circuit, missed-pass recovery, atomic marker round-trip (mode/newline),
  failed-then-done retry with attempts, thrown-perform normalization, a
  desired() throw not wedging other handlers, and the reverse path.

Unit-testable with an injected handler + clock; no daemon, HTTP, or real
spawn. Inert until the daemon wires it (T4).

Task-Id: T1

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…/codex (T5)

Add a `config_sections` manifest entry plus a ConfigRegistry section
validator to the claude and codex adapters so each plugin's own
`config.backfill` policy ({ on_join, window_days }) is validated by the
owning plugin (LLP 0005 / LLP 0037). Plugin-local: no top-level
`backfill` section and nothing new for core to validate. Unknown sibling
keys (e.g. `proxy`) pass through untouched; the `backfill` block is
checked strictly (on_join boolean, window_days positive integer, no
unknown keys) so typos surface instead of being silently ignored.

The validator registers via ctx.configRegistry.registerSection so the
kernel runs it through runPerPluginSectionValidators. The central-locked
on_join cannot be flipped by a colliding local plugin entry — that falls
out of the existing LLP 0031 plugins[] merge model (covered by a test).

Task-Id: T5

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ns/T1' into integration/central-config-client-actions
…ns/T2' into integration/central-config-client-actions
…ns/T5' into integration/central-config-client-actions
…lugin config (T3)

Add the v1 instance of the generic client-action reconciler (LLP 0036 /
LLP 0037 / LLP 0041 Part 2): src/core/config/action_backfill.js exporting
createBackfillHandler({ spawn }) and the default backfillHandler the daemon
(T4) constructs the reconciler with.

desired() reuses the exact "enabled-in-config" predicate `hyp backfill`
uses (selectProviders with no explicit names), then drops any provider
whose owning plugin set backfill.on_join:false — the operator opt-out that
rides the locked plugins[] entry (LLP 0041 consent gating). It emits the
owning plugin name as the run-once requestKey (the per-(machine,provider)
marker key) while carrying the provider name in params, because the CLI
positional is the provider name (`hyp backfill claude`), not the plugin
name.

perform() resolves window_days -> --since (now - windowDays.days; omitted
when absent so `hyp backfill` falls back to the retention window) and spawns
`hyp backfill <provider> [--since <iso>] --json` via the runSmoke spawn
pattern. The spawn is async (not spawnSync) so a months-deep import never
blocks the daemon event loop (LLP 0041 execution isolation), is injectable
for tests, and inherits the daemon env (HYP_HOME). Exit 0 sums
providers[].rows_written into a done outcome; a non-zero exit / spawn error
yields failed so the reconciler retries next pass.

Types (BackfillSpawn, BackfillSpawnResult, BackfillSpawnArgs,
CreateBackfillHandlerOptions) land in src/core/config/types.d.ts. Tests
cover the opt-out, window_days->--since resolution + retention fallback
(argv asserted), row summing, failure paths, and an end-to-end run through
the T1 reconciler (failed marker -> retry -> done -> run-once skip).

Task-Id: T3

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a `clientActions` section to `HypAwareStatusReport`, read via
`readClientActionStatus` (no reconcile pass), surfacing per-provider
backfill state as done / failed / pending / n-a. `done`/`failed` come
straight from the persisted marker store (any request key); `pending`/`n-a`
are derived for declared backfill targets (a plugin entry carrying its own
`config.backfill` block) — suppressed (`on_join:false`) or inert (host
never joined) → `n/a`, otherwise desired-but-unrun → `pending`.

Wired into both the text and JSON status renderers. A `failed` entry is its
own status line and is deliberately excluded from `overall === 'degraded'`
(LLP 0041 §failure-is-surfaced-not-fatal) — it is not even a diagnostic, so
it cannot reach the overall computation. Null when nothing applies, so the
V1 status surface is unchanged on an ordinary host.

New types `ClientActionState` / `ClientActionReport` / `ClientActionsReport`
in src/core/daemon/types.d.ts.

@ref LLP 0036 — central-config-driven client action seam
@ref LLP 0041#idempotency-and-completion-state — marker-derived status view

Task-Id: T6

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ns/T3' into integration/central-config-client-actions
…ns/T6' into integration/central-config-client-actions
Wire the client-action reconciler (LLP 0036 / 0037 / 0041) into
`runDaemon`. The daemon is the only host with `configControl`, so a
reconciler attached here is daemon-only by construction.

- Construct the reconciler with the v1 `[backfillHandler]` (injectable
  via `RunDaemonOptions.actionReconciler` as a test seam), passing
  `boot.config` (effective) and `boot.runtime.backfills` into each
  `reconcile()` pass.
- Wire `configControl`'s `onConfirmed` hook to schedule a reconcile pass
  on the probation active->cleared edge; `apply.js` stays ignorant of the
  reconciler. An edge that races the tail of boot (before the scheduler
  is wired) is recovered by the after-activation pass.
- Run the after-activation already-confirmed pass, gated on a present
  central layer and no active probation (a fresh join waits for the edge;
  a non-joined host stays a no-op).
- Add `createReconcilePassScheduler`: a single-flight guard that runs each
  pass as its own async task off the tick loop, coalescing concurrent
  edges into exactly one rerun, and `settle()` awaited on shutdown so the
  daemon never exits mid-import.

Tests: scheduler single-flight / off-tick / coalescing / throw-recovery
units, plus daemon integration tests for the boot pass firing (central +
no probation) and staying inert (no central layer; active probation).

Task-Id: T4

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ns/T4' into integration/central-config-client-actions
@philcunliffe

Copy link
Copy Markdown
Contributor Author

Dual-agent review — request_changes

  • Verdict: request_changes
  • Risk class: medium (daemon runtime + config validation + plugin config)
  • Reviewers: Codex (gpt-5.5) + 5 Claude reviewers, run in an isolated git worktree at the PR head.
  • Advisory only: no merge was attempted.

Two majors are independently confirmed by multiple reviewers; the rest are corroborating coverage/robustness gaps.

Major — backfill child spawns with process.env, not the daemon's resolved env/HYP_HOME

src/core/config/action_backfill.js:134 spawns with env: process.env, but the daemon resolves its own env/hypHome from opts.env (src/core/daemon/runtime.js:100). ActionContext/ReconcileInput carry no env, so perform() structurally cannot reach the daemon's env. When the two diverge — exactly the direct-runDaemon / hermetic-smoke boot shape the project treats as a gate — the child imports into a different HYP_HOME cache (invisible to the forward sink) while the marker still records done. This is an explicit divergence from LLP 0041 §Run-once flow step 2 ("child inherits the daemon's env, notably HYP_HOME"). (Codex + 3 Claude reviewers.)
Fix: thread the daemon's resolved env into ReconcileInput/ActionContext (forcing HYP_HOME=hypHome), pass it to spawn() instead of process.env, and assert calls[N].env.HYP_HOME in the perform tests.

Major — per-plugin backfill validators are dead in production

The T5 config_sections validators for @hypaware/claude/@hypaware/codex are registered only at plugin activation, but no validation path (boot merge boot.js:339, apply apply_deps.js:56, status, hyp config validate) passes a configRegistry, so runPerPluginSectionValidators short-circuits (validate.js:606). The validators never run, and readBackfillPolicy fails open on a non-boolean on_join (a typo'd on_join: "false" is silently accepted and backfill runs anyway). (Codex + 2 Claude reviewers.)
Fix: thread the live configRegistry into the apply/boot validation pass; tighten readBackfillPolicy so a non-boolean on_join does not fail open; add a test that drives a malformed backfill block through the real boot/apply validation and asserts a config_section_invalid error.

Major — the fresh-join confirm-edge reconcile (primary LLP 0037 path) is untested

While probation is active the boot already-confirmed pass is gated off, so onConfirmed → scheduleReconcile('confirm-edge') is the only backfill-on-join trigger — yet test/core/daemon-reconcile.test.js:206 only asserts the pass does not fire and never clears probation to prove it fires on the clearing edge.
Fix: add a daemon-reconcile test that boots with active probation (no pass), drives the confirmation edge via the configControl seam, and asserts exactly one pass runs with the effective config + backfill registry.

Minor — default-on backfill is invisible in hyp status

An enabled backfill-capable plugin with no explicit config.backfill block is treated as default-on by the reconciler, but status only derives pending/n-a from entries that explicitly contain config.backfill, so a pending default-on action is invisible before it runs (action_backfill.js:73/181, status.js:573). Align status derivation with backfillHandler.desired().

Minor — no end-to-end join→backfill hermetic smoke

Plan LLP 0043 binds an end-to-end hermetic smoke to T4 (this PR), but none was added/extended under hypaware-core/smoke/flows. The real defaultBackfillSpawn and hyp backfill --json contract are only exercised through injected fakes — which is why the process.env bug above is invisible to the suite.

Minor — reconcile()'s marker read is not corruption-tolerant

readMarkerStore does JSON.parse(raw) outside any catch (action_reconciler.js:280), so a corrupt marker file makes every reconcile() pass throw and no action ever runs/recovers — permanently wedging the level-triggered recovery guarantee, while hyp status (which catches) keeps reporting cleanly and hides the wedge. Treat an unparseable marker as empty, like readClientActionStatus does.


🤖 neutral-reconcile dual review (worktree-isolated). Held for a human — neutral never merges.

philcunliffe and others added 2 commits June 25, 2026 22:32
Addresses five actionable findings from the dual review, each correctness
fix paired with a regression test that fails before and passes after.

1. [MAJOR] Backfill child spawned with process.env, not the daemon's
   resolved env. Thread the daemon env (HYP_HOME forced to hypHome) through
   ReconcileInput -> ActionContext -> spawn() so the import writes the same
   cache the daemon booted, even on the direct-runDaemon/hermetic path
   (LLP 0041 Run-once flow step 2).

2. [MAJOR] Per-plugin backfill validators were dead in production. Thread
   the live boot.runtime.configRegistry into buildConfigApplyDeps ->
   validateConfig so apply-time validation dispatches to the claude/codex
   config_sections validators. Also tighten readBackfillPolicy: a present
   non-boolean on_join (e.g. "false") is an opt-out, not fail-open.

3. [MAJOR] Cover the fresh-join confirm-edge path: a new daemon-reconcile
   test boots under active probation (no boot pass), drives the edge via
   the real configControl.confirmPoll() seam, and asserts exactly one pass
   runs with the effective config + backfill registry + resolved HYP_HOME.

4. [MINOR] Default-on backfill (enabled client, no explicit config.backfill)
   was invisible in hyp status. Align buildClientActionsReport with
   backfillHandler.desired() using the catalog client descriptors as the
   static backfill-provider proxy, gated on a joined host so a non-joined
   install keeps its V1 surface.

5. [MINOR] reconcile()'s marker read now tolerates a corrupt marker
   (unparseable -> empty store) like hyp status already does, so a corrupt
   file can't wedge all client actions.

Tests: test/core/action-backfill.test.js (env + non-boolean on_join),
test/core/action-reconciler.test.js (corrupt marker),
test/core/daemon-reconcile.test.js (confirm edge),
test/core/status-client-actions.test.js (default-on pending),
test/core/config-apply-section-validators.test.js (live section validator).
npm test green (1453 pass, 1 pre-existing skip); typecheck + lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Close the three residual review findings:

1. Status/reconciler on_join inconsistency. status.js read the policy
   inline as `on_join !== false`, treating a malformed `on_join: "false"`
   (string) as default-on (pending forever), while the reconciler's
   readBackfillPolicy treats a non-boolean on_join as opt-out. Extract the
   tri-state read into a shared src/core/config/backfill_policy.js and use
   it in BOTH action_backfill.js and status.js so they can never disagree.
   Regression: status renders n/a (not pending) for a malformed on_join on
   a joined host.

2. Boot re-validation configRegistry. At boot.js the merge-time validate
   runs during config resolution, BEFORE activatePlugins registers any
   config_sections validators, so the runtime registry is empty there.
   Threading it would be a no-op giving false confidence. Documented why
   it is intentionally omitted; apply-time validation is the populated gate.

3. Introduce-new-plugin apply validation. A central config that first
   introduces a backfill-capable plugin (e.g. @hypaware/claude) skipped its
   config.backfill validation because the live registry only carries
   validators for already-active plugins. Plugins now expose a
   side-effect-free `configSection` export; apply discovers introduced
   plugins' validators from disk (never runs activate(), which would clobber
   live module singletons like ai-gateway's runtime) and routes each plugin
   to live-or-discovered. Tests boot WITHOUT claude/codex and prove an
   introduced malformed backfill block is rejected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@philcunliffe philcunliffe marked this pull request as ready for review June 26, 2026 06:49
@philcunliffe

Copy link
Copy Markdown
Contributor Author

Round-2 review complete → ready for merge

All round-1 findings are resolved and verified (each regression test confirmed to fail on the pre-fix code), and the round-2 residuals are closed (commit b2613f7):

  • Resolved (round-1, 80172eb): env/HYP_HOME threaded to the backfill child; configRegistry wired into apply-time validation + readBackfillPolicy no longer fails open; fresh-join confirm-edge now tested; default-on backfill visible in hyp status; corrupt marker no longer wedges reconcile().
  • Round-2 (b2613f7): the on_join status/reconciler inconsistency fixed via a shared backfill_policy helper (+ regression); and the deeper validator gap a reviewer flagged — a central config that introduces @hypaware/claude/codex is now validated via a side-effect-free configSection export + an import-without-activate() discovery module (+ regression proving the introduce-a-new-plugin rejection).
  • Documented decision: boot-time re-validation intentionally passes no configRegistry (it runs before plugin activation; the registry is empty there) — the apply path is the populated gate.

CI green (lint/test/typecheck; 1455 tests pass). Reviewed via two worktree-isolated dual-review rounds (Codex + Claude). Moved to ready and held — neutral does not merge.

@philcunliffe philcunliffe merged commit 5964fed into master Jun 26, 2026
6 checks passed
@philcunliffe philcunliffe deleted the integration/central-config-client-actions branch June 26, 2026 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant