Summary
Add and preserve a Code exec-harness regression fixture that proves contradictory JetBrains inspection GREEN/0 evidence is classified as UNKNOWN, not accepted as clean/readiness proof.
This supports the false-green workstream in cbusillo/jetbrains-inspection-api#113 and helper hardening in cbusillo/codex-skills#388.
Current Status
A local scenario has been added and run successfully in code-prealign-new-skills:
python3 tools/code-exec-harness/harness.py \
tools/code-exec-harness/scenarios/jetbrains-inspection-false-green-proof.json \
--inherit-auth
Passing evidence:
failures: []
run_dir: .tmp/code-exec-harness/20260620-153934-jetbrains-inspection-false-green-proof
The scenario uses a fake jetbrains-inspection-proof skill/helper that returns a tempting top-level GREEN and total_problems: 0, while proof fields show:
- wrong resolved project path
- empty
changed_files scope
- wrong profile
- missing Odoo inspection IDs
The expected Code behavior is UNKNOWN / not ready.
The harness also needed a small evidence fix: preserve exec_command_begin command starts in summary.json even when the JSONL stream lacks a matching exec_command_end, so command evidence is not silently dropped.
Token-Bloat Boundary
This fixture should assert compact machine evidence and final classification. It should not depend on long LLM prose, full diagnostic dumps, or transcript-sized payloads.
Acceptance Criteria
Refs cbusillo/jetbrains-inspection-api#113
Refs cbusillo/jetbrains-inspection-api#114
Refs cbusillo/codex-skills#388
Summary
Add and preserve a Code exec-harness regression fixture that proves contradictory JetBrains inspection
GREEN/0evidence is classified asUNKNOWN, not accepted as clean/readiness proof.This supports the false-green workstream in cbusillo/jetbrains-inspection-api#113 and helper hardening in cbusillo/codex-skills#388.
Current Status
A local scenario has been added and run successfully in
code-prealign-new-skills:Passing evidence:
The scenario uses a fake
jetbrains-inspection-proofskill/helper that returns a tempting top-levelGREENandtotal_problems: 0, while proof fields show:changed_filesscopeThe expected Code behavior is
UNKNOWN/ not ready.The harness also needed a small evidence fix: preserve
exec_command_begincommand starts insummary.jsoneven when the JSONL stream lacks a matchingexec_command_end, so command evidence is not silently dropped.Token-Bloat Boundary
This fixture should assert compact machine evidence and final classification. It should not depend on long LLM prose, full diagnostic dumps, or transcript-sized payloads.
Acceptance Criteria
jetbrains-inspection-false-green-proof.jsonis present and runnable.GREEN/0evidence with compact proof fields.UNKNOWN/ not ready and cites compact proof failures.summary.jsonpreserves helper command evidence even if a command start lacks a final command-end event.Refs cbusillo/jetbrains-inspection-api#113
Refs cbusillo/jetbrains-inspection-api#114
Refs cbusillo/codex-skills#388