Skip to content

Fix tractable findings from PR #678 multi-model review#686

Open
eanzhao wants to merge 4 commits into
devfrom
fix/2026-05-19_pr678-review-fixes
Open

Fix tractable findings from PR #678 multi-model review#686
eanzhao wants to merge 4 commits into
devfrom
fix/2026-05-19_pr678-review-fixes

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented May 19, 2026

Context

PR #678 (Refactor iter1: clean 3 architectural violations + add codex-refactor-loop skill)
was merged into dev (merge commit 601d92b2) before a post-merge review completed.
This follow-up PR carries that review as its body and fixes every tractable finding.

The review ran 6 models over the #678 diff via /opencode-pr-review.
deepseek-v4-pro · glm-5.1 · mimo-v2.5-pro · codex produced parseable reviews;
kimi timed out and gemini OOM'd — both excluded. 23 raw clusters were
consolidated into ~18 distinct findings; each is dispositioned below.

Finding Sev Models Disposition
M1 hardcoded dev paths in skill prompts major glm-5.1 ✅ fixed
M4 IProjectionSessionEventCodec<AGUIEvent> DI collision major codex ✅ fixed
M5 projector synthesizes StateVersion major codex ✅ fixed
M6 fire-and-forget tool coordinator major v4-pro + mimo ✅ fixed
m1 misleading OpenTelemetry comment minor glm-5.1 + mimo ✅ fixed
M3 HouseholdEntityToolSource DI param major glm-5.1 ✔️ verified OK
M2 reasoning collects only DeltaContent major v4-pro 📝 correct as-is
M7 hook-rewrite-to-destructive continues major codex 📝 pre-existing
M8 2-min timeout now covers dispatch major v4-pro 📝 correct as-is
M9 unauthenticated actor-query endpoints major codex + v4-pro 🚫 PR-acknowledged
8 minor findings minor various 📝 see below

✅ Fixed

M4 — IProjectionSessionEventCodec<AGUIEvent> DI collision · major (codex) — PR-introduced

GAgentService.Projection registered GAgentDraftRunSessionEventCodec and NyxidChat
registered NyxIdChatSessionEventCodec — both as IProjectionSessionEventCodec<AGUIEvent>
via TryAddSingleton, so the second registration was a silent no-op and a pipeline
could run on the wrong codec/channel. Script-service AGUI had no codec at all and rode
on the draft-run one. Fix: each pipeline (NyxId chat, draft-run, script-service)
now builds its own channel-scoped ProjectionSessionEventHub<AGUIEvent> via a factory
registration; adds a dedicated ScriptServiceAguiSessionEventCodec. The hub holds no
mutable state, so a per-consumer instance is safe.

M5 — UserAgentCatalogProjector synthesizes StateVersion · major (codex)

The projector merges two authoritative sources (catalog + runner) and set
StateVersion = CatalogSourceVersion + RunnerSourceVersion. It had no per-source
staleness guard
, so a stale/replayed event for one source would roll that source's
version back and re-write older fields. Fix: added per-source monotonic guards —
a catalog/runner commit not newer than the document's stored version for that source
is skipped. With the guards every accepted event strictly advances one source, so the
sum is strictly monotonic per document and a valid overwrite watermark. The honest
per-source versions are retained; a fully vector-typed StateVersion would need a
proto schema change (noted for a dedicated change).

M6 — StreamingToolExecutor coordinator started fire-and-forget · major (v4-pro + mimo, consensus)

_ = RunCoordinatorAsync() — a fault in the loop was unobserved, the loop silently
died, and GetRemainingResultsAsync hung forever. Fix: the coordinator task is
retained; RequestStateAsync races the request against it so faults surface to the
caller; Dispose observes a faulted task. Regression test
CoordinatorFault_ShouldSurfaceToCaller_NotHang added.

M1 — hardcoded /Users/auric/aevatar/ paths in codex-refactor-loop prompts · major (glm-5.1)

remote-ci-fix.md / test-add.md$REPO_ROOT/, matching audit.md / implement.md / verify.md.

m1 — misleading OpenTelemetry version comment · minor (glm-5.1 + mimo)

Directory.Packages.props comment claimed the variable holds 1.15.3; it holds 1.15.1.
Comment corrected — security-critical packages are individually pinned to 1.15.3.

✔️ Verified — no fix needed

  • M3 (glm-5.1): HouseholdEntityToolSource is registered by-type, so DI auto-supplies
    the new IActorDispatchPort ctor param; tests construct it directly. No change needed.

📝 Reviewed — current code is correct (no change, with reasons)

  • M2 (v4-pro): HouseholdEntity reasoning aggregates chunk.DeltaContent. For a normal
    model this reconstructs exactly the assistant answer the old ChatAsync returned; the
    decision text (NO_ACTION) is content, not reasoning. Folding in DeltaReasoningContent
    would pollute the decision with chain-of-thought — a regression. Correct as-is.
  • M7 (codex): StreamingToolExecutor:326 hook-rewrite-to-destructive "records error +
    continues" — this is pre-existing behavior (the old code set _hasErrored and
    continued too); the refactor only changed the mechanism. Not a Refactor iter1: clean 3 architectural violations + add codex-refactor-loop skill #678 regression.
  • M8 (v4-pro): ScopeServiceEndpoints 2-min linked timeout now covers RunRuntimeAsync.
    Dispatch is accepted-only (fast); the timeout also usefully bounds a stuck mailbox.
    v4-pro's suggested split would leave dispatch unbounded — current code is acceptable.

📝 Minor findings — reviewed

  • NyxIdChatStreamingRunner:48 finally-block OCE — the exception falls through to the
    outer cancellation handler; behavior is correct.
  • ScopeBindingCommandApplicationService "hardcoded PropagationStage" (mimo) — the named
    fields (AcceptanceStage / PropagationStage) do not exist in the code; inaccurate finding.
  • StreamingToolExecutor:155 GetCompletedResults race — reviewer notes it self-resolves.
  • StreamingToolExecutor:198 Dispose race — mitigated by the M6 fix (RequestStateAsync
    now throws instead of hanging).
  • ConnectedServiceSpecCache dual constructor — deliberate & documented (// Fix (remote-ci...): DI ValidateService ctor-ambiguity workaround). Left intact by design.
  • NyxIdSpecCatalog singleton disposal — already guarded (Interlocked, OCE catches,
    finally cleanup); the fire-and-forget cleanup is acceptable for a shutdown singleton.
  • UserAgentCatalogProjector runner-id fallback — the runner branch only runs after a
    successful TryUnpackState<SkillRunnerState>; the PublisherActorId fallback is sound.
  • architecture_guards.sh tools/ scan — currently passes; the concern is hypothetical.

🚫 Out of scope — already acknowledged by PR #678

Test plan

  • dotnet build — affected projects build
  • Aevatar.GAgents.ChannelRuntime.Tests — 820/820 pass (M5)
  • Aevatar.AI.Tests — 556/556 pass (M4 NyxId chat, M6)
  • Aevatar.GAgentService.Tests — M4/M5 add 0 new failures
  • tools/ci/architecture_guards.sh — all guards pass
  • tools/ci/test_stability_guards.sh — passes
  • Remote CI green on dev base

ℹ️ Also fixed — 3 pre-existing stale tests (b4ef987c): ServiceProjectionInfrastructureTests,
ServiceServingProjectionInfrastructureTests and ServiceConfigurationProjectionInfrastructureTests
asserted ImplementationType against the bare projector type, but AddProjectionArtifactMaterializer /
AddCurrentStateProjectionMaterializer wrap materializers in the Observed* observability decorator.
These were red on dev independently of #678; the assertions now verify the projector via
ImplementationType.GenericTypeArguments.

🤖 Generated with Claude Code

A post-merge multi-model review of PR #678 (merged into dev as 601d92b)
surfaced ~18 distinct findings. This addresses the three that are safely
fixable without architectural rework:

- StreamingToolExecutor: the channel coordinator was started fire-and-forget,
  so a fault inside the loop was unobserved — the loop silently died and
  GetRemainingResultsAsync hung forever. Retain the coordinator task, race
  state requests against it so faults surface to the caller, and observe the
  task on Dispose. Adds regression test
  CoordinatorFault_ShouldSurfaceToCaller_NotHang.
- codex-refactor-loop skill prompts: replace hardcoded /Users/auric/aevatar/
  paths in remote-ci-fix.md and test-add.md with $REPO_ROOT/, matching the
  convention already used by audit.md / implement.md / verify.md.
- Directory.Packages.props: correct the OpenTelemetry comment that claimed the
  shared variable holds 1.15.3 when it actually holds 1.15.1.

Remaining findings (IProjectionSessionEventCodec<AGUIEvent> DI collision,
UserAgentCatalogProjector StateVersion synthesis, etc.) are documented in the
PR description; they need dedicated design work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 96.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.44%. Comparing base (601d92b) to head (b4ef987).
⚠️ Report is 9 commits behind head on dev.

Files with missing lines Patch % Lines
src/Aevatar.AI.Core/Tools/StreamingToolExecutor.cs 84.61% 2 Missing ⚠️
@@            Coverage Diff             @@
##              dev     #686      +/-   ##
==========================================
+ Coverage   82.42%   82.44%   +0.01%     
==========================================
  Files         938      939       +1     
  Lines       59753    59788      +35     
  Branches     7831     7830       -1     
==========================================
+ Hits        49251    49290      +39     
+ Misses       7128     7125       -3     
+ Partials     3374     3373       -1     
Flag Coverage Δ
ci 82.44% <96.66%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...DependencyInjection/ServiceCollectionExtensions.cs 68.44% <100.00%> (+2.30%) ⬆️
...rchestration/ScriptServiceAguiSessionEventCodec.cs 100.00% <100.00%> (ø)
src/Aevatar.AI.Core/Tools/StreamingToolExecutor.cs 86.37% <84.61%> (-0.18%) ⬇️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

eanzhao and others added 3 commits May 19, 2026 16:03
…view)

M4 — IProjectionSessionEventCodec<AGUIEvent> DI collision: NyxId chat, GAgent
draft-run and script-service AGUI all projected AGUIEvent through one shared
IProjectionSessionEventHub<AGUIEvent>; TryAddSingleton silently dropped
whichever module's codec registered second, so a pipeline could run on the
wrong codec/channel. Each pipeline now builds its own channel-scoped hub via a
factory registration; adds a dedicated ScriptServiceAguiSessionEventCodec (it
previously rode on the draft-run codec through the shared registration).

M5 — UserAgentCatalogProjector StateVersion: the projector merges two
authoritative sources (catalog + runner) and synthesized StateVersion as their
sum. Add per-source monotonic guards so a stale/replayed event for one source
cannot roll its version back or re-write older fields over newer ones; with the
guards the sum is strictly monotonic per document and a valid overwrite
watermark. The honest per-source versions are kept; a vector-typed StateVersion
would need a proto schema change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three *ProjectionInfrastructureTests asserted `ImplementationType ==
typeof(<Projector>)` for projection materializers. AddProjectionArtifactMaterializer
/ AddCurrentStateProjectionMaterializer register materializers wrapped in the
Observed*Materializer<TContext, TProjector> observability decorator, so
ImplementationType is the decorator type — not the bare projector. These tests
had been red on dev independently of PR #678 and of this branch's fixes.

Assert the decorator wraps the expected projector via
ImplementationType.GenericTypeArguments instead (the Observed* decorators are
internal to Aevatar.CQRS.Projection.Core and cannot be named from this test
assembly). 10 assertions across 3 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant