From b81a597231ffc05a7387d1541cb0f3d474fd2641 Mon Sep 17 00:00:00 2001 From: docushell-admin Date: Tue, 30 Jun 2026 14:52:59 +0530 Subject: [PATCH 01/10] Document app answer release contract Signed-off-by: docushell-admin --- README.md | 6 +- SPEC.md | 2 + docs/app-answer-release-contract.md | 122 ++++++++++++++++++++++++++++ 3 files changed, 129 insertions(+), 1 deletion(-) create mode 100644 docs/app-answer-release-contract.md diff --git a/README.md b/README.md index 761087b..aaddc40 100644 --- a/README.md +++ b/README.md @@ -240,6 +240,9 @@ canonical JSON report. Rust callers can derive the same view from submitted is not certified. `unverified` means no check is reusable. Final grounded answers should be assembled only from reusable grounded checks, and retrieval citations, model-returned evidence IDs, or answer text are not proof until checked against a grounding source. +Apps that also need question relevance or synthesis policy before releasing answer text should +apply the separate [`app answer release contract`](docs/app-answer-release-contract.md) above the +Ethos grounding check. ## Evidence anchoring @@ -384,7 +387,8 @@ The adapter owns the mapping from parser-native structures into Ethos evidence c elements, text, tables, regions, fingerprints, and capability declarations. The verifier then checks whether caller-provided citations bind to that source evidence. Product layers can use `VerificationReport::proof_summary()` for release wording, but the canonical report remains the -audit artifact. +audit artifact. Apps that release final answer text should also apply the separate +[`app answer release contract`](docs/app-answer-release-contract.md). Start with [`docs/bring-your-own-parser.md`](docs/bring-your-own-parser.md). Use the OpenDataLoader adapter as the fuller reference once the minimal `GroundingSource` shape is clear. diff --git a/SPEC.md b/SPEC.md index 3197b5c..71e1bee 100644 --- a/SPEC.md +++ b/SPEC.md @@ -426,6 +426,8 @@ product wording and release policy, but the canonical `VerificationReport` remai artifact. If a wrapper exposes an `invalid_request` status, it is a process or API envelope for malformed input and MUST NOT be derived from a `VerificationReport`. +Application layers that also need to decide question relevance, synthesis policy, or final answer +release SHOULD apply `docs/app-answer-release-contract.md` above the Ethos grounding result. ### 6.3 Evidence anchor rules diff --git a/docs/app-answer-release-contract.md b/docs/app-answer-release-contract.md new file mode 100644 index 0000000..95ed63a --- /dev/null +++ b/docs/app-answer-release-contract.md @@ -0,0 +1,122 @@ +# App Answer Release Contract + +Status: source-only guidance for products, APIs, and UI layers that consume Ethos verification +reports. + +This document defines how an application should decide what answer text it may show after Ethos +checks citation grounding. It does not add a new verifier status, JSON report field, command, +hosted API, parser adapter, or semantic judge. + +## Boundary + +Ethos answers one deterministic question: + +```text +Do these caller-provided citations bind to source evidence exposed by a trusted GroundingSource? +``` + +The canonical audit artifact remains `verification_report.json`, governed by +`schemas/ethos-verification-report.schema.json`. Rust callers may derive +`VerificationReport::proof_summary()` and Python callers may derive `proof_summary(report)` for +product wording, reusable-check selection, and API wrapper policy. That derived summary is not a +replacement for the canonical report. + +If a wrapper exposes `invalid_request`, that status is a process or API envelope for malformed +input, invalid configuration, adapter failure, or usage errors. It is not derived from a +`VerificationReport`. + +## Three Axes + +Application answer release decisions should keep three axes separate. + +| Axis | Owner | Question | +| --- | --- | --- | +| Citation grounding | Ethos | Does the cited evidence ID, quote, value, table cell, or target exist and match the trusted source evidence? | +| Question relevance | Application | Does the grounded evidence actually answer or support the user's question? | +| Synthesis level | Application | Is the claim directly stated by the source, or is it an inference across multiple grounded facts? | + +Ethos is strongest on citation grounding. It does not know the user's question unless a wrapper +adds that context above the verifier, and it does not silently convert grounded snippets into a +semantically complete answer. + +## Grounding Status + +Use the derived proof summary as the grounding axis only: + +- `verified`: the submitted request is certified by `all_evidence_grounded`. +- `partially_verified`: some checks are reusable, but the request as submitted is not certified. +- `unverified`: no check is reusable. + +Reusable grounded checks must satisfy all of these conditions: + +- the check status is `grounded`; +- `semantic_unverified` is false; +- `fingerprint_stale` is false. + +Capability limits and proof limitations must stay visible to users or downstream policy, but they +are not automatic proof failures by themselves. They describe what Ethos could not prove at the +available adapter/source fidelity. + +## Application Labels + +An application should label every generated claim before deciding whether to show it as final +answer text. + +Suggested `question_relevance` values: + +- `direct_answer`: the grounded evidence directly answers the user question. +- `supports_answer`: the grounded evidence is needed to support the answer but is not sufficient + alone. +- `background_only`: the grounded evidence is true but not responsive to the question. +- `unrelated`: the grounded evidence does not support the requested answer. + +Suggested `claim_type` values: + +- `source_fact`: the claim is directly stated by source evidence. +- `synthesis`: the claim combines multiple grounded facts or adds reasoning across them. +- `unsupported`: the claim cannot be traced to grounded source evidence. + +These labels may come from application policy, a reviewed model output schema, human review, or a +separate evaluator. They are outside the canonical Ethos verification report. + +## Release Rules + +A conservative first application policy is: + +| App status | Rule | Default UI treatment | +| --- | --- | --- | +| `certified` | Citation grounding is true, `question_relevance` is `direct_answer` or `supports_answer`, and `claim_type` is `source_fact`. | Show in the final answer. | +| `partial_certified` | At least one claim is `certified`, and at least one requested claim is blocked or review-only. | Show only certified claims; disclose that the answer is partial. | +| `supported_synthesis_needs_review` | Citation grounding is true, `question_relevance` is `direct_answer` or `supports_answer`, and `claim_type` is `synthesis`. | Keep out of the main answer unless the product explicitly allows reviewed synthesis. | +| `grounded_but_irrelevant` | Citation grounding is true, but `question_relevance` is `background_only` or `unrelated`. | Block from the final answer. | +| `cannot_answer_from_sources` | No relevant grounded source facts are available. | Say that the sources do not support an answer. | + +This preserves the strict Ethos rule that grounded citations are necessary, while avoiding the +separate failure mode where a true citation supports the wrong answer. + +## Wrapper Requirements + +Applications that use Ethos as a release gate should: + +- build verification requests from trusted source maps or parser artifacts, not from model-returned + citation IDs alone; +- pass the original user question into the application gate that labels relevance and synthesis; +- release final answer text from certified source facts by default; +- put synthesis in a separate review surface unless the product has an explicit synthesis policy; +- treat retrieval citations, chunks, and model-selected evidence IDs as candidates, not proof; +- keep `verification_report.json` available for audit even when a derived status is shown to users. + +The user-facing copy should avoid saying "Ethos verified the answer" unless the application has +also checked relevance and synthesis. Safer wording is: + +```text +Ethos verified citation grounding. +Answer relevance: direct, partial, or off-topic. +``` + +## Parser Neutrality + +This contract is not DocuShell-specific. Any parser can participate when its output is adapted into +the parser-neutral `GroundingSource` boundary and the application supplies citation claims for +Ethos to check. The application still owns retrieval, question relevance, synthesis policy, and +final answer release. From 875628d8d66d89693acf9a84086a4b761ff512c8 Mon Sep 17 00:00:00 2001 From: docushell-admin Date: Tue, 30 Jun 2026 15:04:53 +0530 Subject: [PATCH 02/10] Add app answer release decision schema Signed-off-by: docushell-admin --- .../test_app_answer_release_contract.py | 206 +++++++++++++++++ Makefile | 8 + docs/app-answer-release-contract.md | 13 ++ schemas/README.md | 8 + ...os-app-answer-release-decision.schema.json | 215 ++++++++++++++++++ .../app-answer-release-decision.example.json | 62 +++++ schemas/validate_examples.py | 3 + 7 files changed, 515 insertions(+) create mode 100644 .github/scripts/test_app_answer_release_contract.py create mode 100644 schemas/ethos-app-answer-release-decision.schema.json create mode 100644 schemas/examples/app-answer-release-decision.example.json diff --git a/.github/scripts/test_app_answer_release_contract.py b/.github/scripts/test_app_answer_release_contract.py new file mode 100644 index 0000000..2a870bc --- /dev/null +++ b/.github/scripts/test_app_answer_release_contract.py @@ -0,0 +1,206 @@ +#!/usr/bin/env python3 +# +# Copyright 2026 The Ethos maintainers +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from __future__ import annotations + +import json +import unittest +from pathlib import Path + +from jsonschema import Draft202012Validator +from makefile_guard import makefile_text, target_block + + +ROOT = Path(__file__).resolve().parents[2] +CONTRACT_DOC = ROOT / "docs/app-answer-release-contract.md" +SCHEMA = ROOT / "schemas/ethos-app-answer-release-decision.schema.json" +EXAMPLE = ROOT / "schemas/examples/app-answer-release-decision.example.json" +VALIDATE_EXAMPLES = ROOT / "schemas/validate_examples.py" +SCHEMAS_README = ROOT / "schemas/README.md" +README = ROOT / "README.md" +SPEC = ROOT / "SPEC.md" + +EXPECTED_TARGET_COMMANDS = [ + "$(PYTHON) schemas/validate_examples.py", + "$(PYTHON) .github/scripts/test_app_answer_release_contract.py", + "$(PYTHON) .github/scripts/claims_gate.py", + "$(PYTHON) .github/scripts/public_boundary_claims_gate.py", + "git diff --check", +] +EXPECTED_PROOF_STATUS = ["verified", "partially_verified", "unverified"] +EXPECTED_PROOF_LIMITATIONS = [ + "capability_limited", + "stale_fingerprint", + "unsupported_claim_kind", + "non_grounded_checks", + "semantic_unverified", +] +EXPECTED_APP_STATUSES = [ + "certified", + "partial_certified", + "supported_synthesis_needs_review", + "grounded_but_irrelevant", + "cannot_answer_from_sources", +] +EXPECTED_RELEVANCE = ["direct_answer", "supports_answer", "background_only", "unrelated"] +EXPECTED_CLAIM_TYPES = ["source_fact", "synthesis", "unsupported"] +EXPECTED_RELEASE_ACTIONS = ["show_final", "needs_review", "block"] +EXPECTED_RELEASE_REASONS = [ + "certified", + "supported_synthesis_needs_review", + "grounded_but_irrelevant", + "cannot_answer_from_sources", +] + + +def load_json(path: Path) -> dict: + return json.loads(path.read_text(encoding="utf-8")) + + +def schema_enum(name: str) -> list[str]: + return load_json(SCHEMA)["$defs"][name]["enum"] + + +def claim_by_id(example: dict) -> dict[str, dict]: + return {claim["id"]: claim for claim in example["claims"]} + + +def normalized_markdown(path: Path) -> str: + return " ".join(path.read_text(encoding="utf-8").split()) + + +class AppAnswerReleaseContractTests(unittest.TestCase): + def test_schema_validates_example(self) -> None: + schema = load_json(SCHEMA) + example = load_json(EXAMPLE) + + Draft202012Validator.check_schema(schema) + errors = sorted( + Draft202012Validator(schema).iter_errors(example), + key=lambda error: list(error.absolute_path), + ) + + self.assertEqual([], errors) + + def test_schema_vocabulary_matches_contract(self) -> None: + self.assertEqual(EXPECTED_PROOF_STATUS, schema_enum("proof_status")) + self.assertEqual(EXPECTED_PROOF_LIMITATIONS, schema_enum("proof_limitation")) + self.assertEqual(EXPECTED_APP_STATUSES, schema_enum("app_status")) + self.assertEqual(EXPECTED_RELEVANCE, schema_enum("question_relevance")) + self.assertEqual(EXPECTED_CLAIM_TYPES, schema_enum("claim_type")) + self.assertEqual(EXPECTED_RELEASE_ACTIONS, schema_enum("release_action")) + self.assertEqual(EXPECTED_RELEASE_REASONS, schema_enum("release_reason")) + + def test_example_covers_relevance_and_synthesis_failure_modes(self) -> None: + example = load_json(EXAMPLE) + claims = claim_by_id(example) + + self.assertEqual("partial_certified", example["app_status"]) + self.assertEqual("partially_verified", example["grounding"]["proof_status"]) + + irrelevant = claims["claim-office-background"] + self.assertTrue(irrelevant["citation_grounded"]) + self.assertEqual("background_only", irrelevant["question_relevance"]) + self.assertEqual("block", irrelevant["release_action"]) + self.assertEqual("grounded_but_irrelevant", irrelevant["release_reason"]) + + synthesis = claims["claim-growth-driver"] + self.assertTrue(synthesis["citation_grounded"]) + self.assertEqual("synthesis", synthesis["claim_type"]) + self.assertEqual("needs_review", synthesis["release_action"]) + self.assertEqual("supported_synthesis_needs_review", synthesis["release_reason"]) + + unsupported = claims["claim-margin"] + self.assertFalse(unsupported["citation_grounded"]) + self.assertEqual("unsupported", unsupported["claim_type"]) + self.assertEqual("cannot_answer_from_sources", unsupported["release_reason"]) + + def test_final_answer_ids_are_only_certified_source_facts(self) -> None: + example = load_json(EXAMPLE) + claims = claim_by_id(example) + + for claim_id in example["final_answer_claim_ids"]: + claim = claims[claim_id] + self.assertEqual("show_final", claim["release_action"]) + self.assertEqual("certified", claim["release_reason"]) + self.assertTrue(claim["citation_grounded"]) + self.assertIn(claim["question_relevance"], ["direct_answer", "supports_answer"]) + self.assertEqual("source_fact", claim["claim_type"]) + + for claim_id in example["review_claim_ids"]: + self.assertEqual("needs_review", claims[claim_id]["release_action"]) + + for claim_id in example["blocked_claim_ids"]: + self.assertEqual("block", claims[claim_id]["release_action"]) + + def test_schema_registry_validates_example(self) -> None: + text = VALIDATE_EXAMPLES.read_text(encoding="utf-8") + + self.assertIn('"ethos-app-answer-release-decision.schema.json"', text) + self.assertIn('EXAMPLES / "app-answer-release-decision.example.json"', text) + + def test_schema_readme_registers_non_canonical_wrapper(self) -> None: + text = SCHEMAS_README.read_text(encoding="utf-8") + + self.assertIn("`ethos-app-answer-release-decision.schema.json`", text) + self.assertIn("`schemas/examples/app-answer-release-decision.example.json`", text) + self.assertIn("not `verification_report.json`", text) + + def test_docs_link_schema_and_keep_boundary_explicit(self) -> None: + text = CONTRACT_DOC.read_text(encoding="utf-8") + normalized = normalized_markdown(CONTRACT_DOC) + + self.assertIn("`schemas/ethos-app-answer-release-decision.schema.json`", text) + self.assertIn("`schemas/examples/app-answer-release-decision.example.json`", text) + self.assertIn("not a replacement for `verification_report.json`", normalized) + self.assertIn("Ethos verified citation grounding.", text) + for token in EXPECTED_APP_STATUSES + EXPECTED_RELEVANCE + EXPECTED_CLAIM_TYPES: + self.assertIn(f"`{token}`", text) + + def test_readme_and_spec_point_to_contract(self) -> None: + for path in [README, SPEC]: + self.assertIn( + "docs/app-answer-release-contract.md", + path.read_text(encoding="utf-8"), + str(path), + ) + + def test_make_target_composes_scoped_guard(self) -> None: + text = makefile_text() + self.assertIn("app-answer-release-contract", text) + + commands = [ + line.strip() + for line in target_block("app-answer-release-contract").splitlines() + if line.strip() + ] + + self.assertEqual(EXPECTED_TARGET_COMMANDS, commands) + block = target_block("app-answer-release-contract") + for out_of_scope in [ + "cargo publish", + "gh release", + "release-candidate-prep", + "milestone-e-prep", + "cargo test", + "npm", + ]: + self.assertNotIn(out_of_scope, block) + + +if __name__ == "__main__": + unittest.main() diff --git a/Makefile b/Makefile index 2feb193..1c9be05 100644 --- a/Makefile +++ b/Makefile @@ -19,6 +19,7 @@ LAYOUT_EVALUATOR_OUT ?= $(ROOT)/target/layout-evaluator-alpha .PHONY: milestone-d-grounding-source-contract .PHONY: milestone-d-crop-element-surface-shape-contract .PHONY: milestone-d-claim-kind-boundary-contract +.PHONY: app-answer-release-contract $(ETHOS_BIN): cargo build --locked -p ethos-cli @@ -61,6 +62,13 @@ evidence-anchor-v1-contract: $(PYTHON) .github/scripts/test_evidence_anchor_v1_contract.py git diff --check +app-answer-release-contract: + $(PYTHON) schemas/validate_examples.py + $(PYTHON) .github/scripts/test_app_answer_release_contract.py + $(PYTHON) .github/scripts/claims_gate.py + $(PYTHON) .github/scripts/public_boundary_claims_gate.py + git diff --check + milestone-d-verify-citations-contract: cargo test --locked -p ethos-cli --test verify $(PYTHON) schemas/validate_examples.py diff --git a/docs/app-answer-release-contract.md b/docs/app-answer-release-contract.md index 95ed63a..b6252d4 100644 --- a/docs/app-answer-release-contract.md +++ b/docs/app-answer-release-contract.md @@ -21,6 +21,13 @@ The canonical audit artifact remains `verification_report.json`, governed by product wording, reusable-check selection, and API wrapper policy. That derived summary is not a replacement for the canonical report. +Applications that want a machine-readable wrapper decision envelope may use +`schemas/ethos-app-answer-release-decision.schema.json`. The example at +`schemas/examples/app-answer-release-decision.example.json` shows how to combine a derived Ethos +proof summary with app-owned relevance and synthesis labels. This envelope is not a replacement for +`verification_report.json`; it records an application release decision above the Ethos grounding +result. + If a wrapper exposes `invalid_request`, that status is a process or API envelope for malformed input, invalid configuration, adapter failure, or usage errors. It is not derived from a `VerificationReport`. @@ -79,6 +86,12 @@ Suggested `claim_type` values: These labels may come from application policy, a reviewed model output schema, human review, or a separate evaluator. They are outside the canonical Ethos verification report. +Suggested `release_action` values for a wrapper decision envelope: + +- `show_final`: release the claim in the final answer. +- `needs_review`: keep the claim in a review surface. +- `block`: keep the claim out of the final answer. + ## Release Rules A conservative first application policy is: diff --git a/schemas/README.md b/schemas/README.md index 2f7b61b..17913de 100644 --- a/schemas/README.md +++ b/schemas/README.md @@ -11,6 +11,7 @@ bumps and downstream sign-off; output-changing heuristics are semver events (PRD | `ethos-security-report.schema.json` | `security_report.json` | | `ethos-citations.schema.json` | citation input consumed by `ethos verify --citations` | | `ethos-verification-report.schema.json` | `verification_report.json` | +| `ethos-app-answer-release-decision.schema.json` | non-canonical app wrapper decision envelope for answer release policy over Ethos proof summaries | | `ethos-evidence-anchor-request.schema.json` | evidence refs consumed by `ethos evidence anchor --evidence-refs` | | `ethos-evidence-anchor-report.schema.json` | `evidence_anchor_report.json` emitted by `ethos evidence anchor` | | `ethos-evidence-anchor-contract.schema.json` | `evidence_anchor` v1 source-only public beta evaluation guard inventory | @@ -53,6 +54,13 @@ security-report / verification-report examples). `verification-report-negative.example.json` shows a non-grounded report with a per-check `reason` label. +App answer release guidance is tracked in `docs/app-answer-release-contract.md`. The schema +`ethos-app-answer-release-decision.schema.json` and example +`schemas/examples/app-answer-release-decision.example.json` describe a non-canonical wrapper +decision envelope for apps that combine Ethos citation grounding with app-owned relevance and +synthesis policy. This envelope is not `verification_report.json`, is not emitted by the verifier, +and is not a replacement for the canonical audit report. + Evidence-anchor V1 guard work is tracked in `docs/evidence-anchor-v1-contract.md`. In this source-only public beta evaluation guard, `evidence_anchor` names the deterministic source-tracing contract currently carried by `ethos evidence anchor`; it does not add semantic answer diff --git a/schemas/ethos-app-answer-release-decision.schema.json b/schemas/ethos-app-answer-release-decision.schema.json new file mode 100644 index 0000000..9ae31b7 --- /dev/null +++ b/schemas/ethos-app-answer-release-decision.schema.json @@ -0,0 +1,215 @@ +{ + "$schema": "https://json-schema.org/draft/2020-12/schema", + "$id": "urn:ethos:schema:app-answer-release-decision:1", + "title": "Ethos app answer release decision", + "description": "Non-canonical wrapper artifact for applications that combine Ethos citation-grounding proof summaries with app-owned question relevance and synthesis policy. This is not verification_report.json and is not emitted by the verifier.", + "type": "object", + "required": [ + "artifact_type", + "schema_version", + "question", + "grounding", + "app_status", + "claims", + "final_answer_claim_ids", + "review_claim_ids", + "blocked_claim_ids" + ], + "additionalProperties": false, + "properties": { + "artifact_type": { "const": "ethos.app_answer_release_decision.v1" }, + "schema_version": { "const": "1.0.0" }, + "question": { + "type": "string", + "minLength": 1, + "description": "Original user question evaluated by the app-layer relevance policy." + }, + "grounding": { + "type": "object", + "required": [ + "verification_report_ref", + "proof_status", + "request_certified", + "reusable_grounded_check_ids", + "needs_review_check_ids", + "proof_limitations" + ], + "additionalProperties": false, + "properties": { + "verification_report_ref": { + "type": "string", + "minLength": 1, + "description": "Application-local pointer to the canonical verification_report.json used for audit." + }, + "proof_status": { "$ref": "#/$defs/proof_status" }, + "request_certified": { "type": "boolean" }, + "reusable_grounded_check_ids": { + "type": "array", + "items": { "type": "string", "minLength": 1 }, + "uniqueItems": true + }, + "needs_review_check_ids": { + "type": "array", + "items": { "type": "string", "minLength": 1 }, + "uniqueItems": true + }, + "proof_limitations": { + "type": "array", + "items": { "$ref": "#/$defs/proof_limitation" }, + "uniqueItems": true + } + } + }, + "app_status": { "$ref": "#/$defs/app_status" }, + "claims": { + "type": "array", + "items": { "$ref": "#/$defs/claim_decision" } + }, + "final_answer_claim_ids": { + "type": "array", + "items": { "type": "string", "minLength": 1 }, + "uniqueItems": true + }, + "review_claim_ids": { + "type": "array", + "items": { "type": "string", "minLength": 1 }, + "uniqueItems": true + }, + "blocked_claim_ids": { + "type": "array", + "items": { "type": "string", "minLength": 1 }, + "uniqueItems": true + }, + "notes": { + "type": "array", + "items": { "type": "string" } + } + }, + "$defs": { + "proof_status": { + "enum": ["verified", "partially_verified", "unverified"] + }, + "proof_limitation": { + "enum": [ + "capability_limited", + "stale_fingerprint", + "unsupported_claim_kind", + "non_grounded_checks", + "semantic_unverified" + ] + }, + "app_status": { + "enum": [ + "certified", + "partial_certified", + "supported_synthesis_needs_review", + "grounded_but_irrelevant", + "cannot_answer_from_sources" + ] + }, + "question_relevance": { + "enum": ["direct_answer", "supports_answer", "background_only", "unrelated"] + }, + "claim_type": { + "enum": ["source_fact", "synthesis", "unsupported"] + }, + "release_action": { + "enum": ["show_final", "needs_review", "block"] + }, + "release_reason": { + "enum": [ + "certified", + "supported_synthesis_needs_review", + "grounded_but_irrelevant", + "cannot_answer_from_sources" + ] + }, + "claim_decision": { + "type": "object", + "required": [ + "id", + "text", + "citation_grounded", + "question_relevance", + "claim_type", + "release_action", + "release_reason" + ], + "additionalProperties": false, + "properties": { + "id": { "type": "string", "minLength": 1 }, + "text": { "type": "string", "minLength": 1 }, + "check_id": { + "type": "string", + "minLength": 1, + "description": "Ethos verification check ID when the claim came from a verification report check." + }, + "citation_grounded": { "type": "boolean" }, + "question_relevance": { "$ref": "#/$defs/question_relevance" }, + "claim_type": { "$ref": "#/$defs/claim_type" }, + "release_action": { "$ref": "#/$defs/release_action" }, + "release_reason": { "$ref": "#/$defs/release_reason" } + }, + "allOf": [ + { + "if": { + "properties": { "release_reason": { "const": "certified" } }, + "required": ["release_reason"] + }, + "then": { + "properties": { + "citation_grounded": { "const": true }, + "question_relevance": { "enum": ["direct_answer", "supports_answer"] }, + "claim_type": { "const": "source_fact" }, + "release_action": { "const": "show_final" } + } + } + }, + { + "if": { + "properties": { + "release_reason": { "const": "supported_synthesis_needs_review" } + }, + "required": ["release_reason"] + }, + "then": { + "properties": { + "citation_grounded": { "const": true }, + "question_relevance": { "enum": ["direct_answer", "supports_answer"] }, + "claim_type": { "const": "synthesis" }, + "release_action": { "const": "needs_review" } + } + } + }, + { + "if": { + "properties": { "release_reason": { "const": "grounded_but_irrelevant" } }, + "required": ["release_reason"] + }, + "then": { + "properties": { + "citation_grounded": { "const": true }, + "question_relevance": { "enum": ["background_only", "unrelated"] }, + "release_action": { "const": "block" } + } + } + }, + { + "if": { + "properties": { "release_reason": { "const": "cannot_answer_from_sources" } }, + "required": ["release_reason"] + }, + "then": { + "properties": { + "release_action": { "const": "block" } + }, + "anyOf": [ + { "properties": { "citation_grounded": { "const": false } } }, + { "properties": { "claim_type": { "const": "unsupported" } } } + ] + } + } + ] + } + } +} diff --git a/schemas/examples/app-answer-release-decision.example.json b/schemas/examples/app-answer-release-decision.example.json new file mode 100644 index 0000000..a238729 --- /dev/null +++ b/schemas/examples/app-answer-release-decision.example.json @@ -0,0 +1,62 @@ +{ + "artifact_type": "ethos.app_answer_release_decision.v1", + "schema_version": "1.0.0", + "question": "What was Q3 2025 revenue?", + "grounding": { + "verification_report_ref": "verification_report.json", + "proof_status": "partially_verified", + "request_certified": false, + "reusable_grounded_check_ids": ["v0001", "v0002", "v0003"], + "needs_review_check_ids": ["v0004"], + "proof_limitations": ["non_grounded_checks"] + }, + "app_status": "partial_certified", + "claims": [ + { + "id": "claim-revenue", + "text": "Revenue grew to $12.4M in Q3 2025.", + "check_id": "v0001", + "citation_grounded": true, + "question_relevance": "direct_answer", + "claim_type": "source_fact", + "release_action": "show_final", + "release_reason": "certified" + }, + { + "id": "claim-office-background", + "text": "The company opened a European office.", + "check_id": "v0002", + "citation_grounded": true, + "question_relevance": "background_only", + "claim_type": "source_fact", + "release_action": "block", + "release_reason": "grounded_but_irrelevant" + }, + { + "id": "claim-growth-driver", + "text": "Q3 revenue growth was likely driven by enterprise expansion.", + "check_id": "v0003", + "citation_grounded": true, + "question_relevance": "supports_answer", + "claim_type": "synthesis", + "release_action": "needs_review", + "release_reason": "supported_synthesis_needs_review" + }, + { + "id": "claim-margin", + "text": "Gross margin improved in Q3 2025.", + "check_id": "v0004", + "citation_grounded": false, + "question_relevance": "direct_answer", + "claim_type": "unsupported", + "release_action": "block", + "release_reason": "cannot_answer_from_sources" + } + ], + "final_answer_claim_ids": ["claim-revenue"], + "review_claim_ids": ["claim-growth-driver"], + "blocked_claim_ids": ["claim-office-background", "claim-margin"], + "notes": [ + "This app-layer envelope is not verification_report.json; it records release policy above Ethos grounding." + ] +} diff --git a/schemas/validate_examples.py b/schemas/validate_examples.py index 7f9ec0c..6d77f9d 100644 --- a/schemas/validate_examples.py +++ b/schemas/validate_examples.py @@ -78,6 +78,9 @@ EXAMPLES / "verification-report.example.json", EXAMPLES / "verification-report-negative.example.json", ]), + ("ethos-app-answer-release-decision.schema.json", [ + EXAMPLES / "app-answer-release-decision.example.json", + ]), ("ethos-evidence-anchor-request.schema.json", [ EXAMPLES / "evidence-anchor-request.example.json", ]), From 092a01b41b6186e024464d9be915dd5d75a1dcd6 Mon Sep 17 00:00:00 2001 From: docushell-admin Date: Tue, 30 Jun 2026 15:15:05 +0530 Subject: [PATCH 03/10] Add Python app answer release helper Signed-off-by: docushell-admin --- .../test_app_answer_release_contract.py | 3 + docs/app-answer-release-contract.md | 6 +- python/README.md | 28 ++ python/ethos_pdf/__init__.py | 2 + python/ethos_pdf/_cli.py | 263 ++++++++++++++++++ python/tests/test_cli_surface.py | 164 +++++++++++ ...os-app-answer-release-decision.schema.json | 12 + .../app-answer-release-decision.example.json | 2 +- 8 files changed, 478 insertions(+), 2 deletions(-) diff --git a/.github/scripts/test_app_answer_release_contract.py b/.github/scripts/test_app_answer_release_contract.py index 2a870bc..cdbbcd2 100644 --- a/.github/scripts/test_app_answer_release_contract.py +++ b/.github/scripts/test_app_answer_release_contract.py @@ -120,6 +120,7 @@ def test_example_covers_relevance_and_synthesis_failure_modes(self) -> None: synthesis = claims["claim-growth-driver"] self.assertTrue(synthesis["citation_grounded"]) + self.assertEqual(["v0001", "v0003"], synthesis["check_ids"]) self.assertEqual("synthesis", synthesis["claim_type"]) self.assertEqual("needs_review", synthesis["release_action"]) self.assertEqual("supported_synthesis_needs_review", synthesis["release_reason"]) @@ -167,6 +168,8 @@ def test_docs_link_schema_and_keep_boundary_explicit(self) -> None: self.assertIn("`schemas/ethos-app-answer-release-decision.schema.json`", text) self.assertIn("`schemas/examples/app-answer-release-decision.example.json`", text) self.assertIn("not a replacement for `verification_report.json`", normalized) + self.assertIn("`app_answer_release_decision(...)`", text) + self.assertIn("`check_ids`", text) self.assertIn("Ethos verified citation grounding.", text) for token in EXPECTED_APP_STATUSES + EXPECTED_RELEVANCE + EXPECTED_CLAIM_TYPES: self.assertIn(f"`{token}`", text) diff --git a/docs/app-answer-release-contract.md b/docs/app-answer-release-contract.md index b6252d4..e622ca2 100644 --- a/docs/app-answer-release-contract.md +++ b/docs/app-answer-release-contract.md @@ -26,7 +26,8 @@ Applications that want a machine-readable wrapper decision envelope may use `schemas/examples/app-answer-release-decision.example.json` shows how to combine a derived Ethos proof summary with app-owned relevance and synthesis labels. This envelope is not a replacement for `verification_report.json`; it records an application release decision above the Ethos grounding -result. +result. Python apps can build the same envelope with `app_answer_release_decision(...)` after they +have supplied relevance and synthesis labels. If a wrapper exposes `invalid_request`, that status is a process or API envelope for malformed input, invalid configuration, adapter failure, or usage errors. It is not derived from a @@ -92,6 +93,9 @@ Suggested `release_action` values for a wrapper decision envelope: - `needs_review`: keep the claim in a review surface. - `block`: keep the claim out of the final answer. +Synthesis claims that combine multiple grounded facts should carry every referenced Ethos check ID +in `check_ids`; all referenced checks must be reusable before the synthesis is reviewable. + ## Release Rules A conservative first application policy is: diff --git a/python/README.md b/python/README.md index 781b216..8114687 100644 --- a/python/README.md +++ b/python/README.md @@ -34,6 +34,7 @@ Public API: - `crop_element` - `verify` - `proof_summary` +- `app_answer_release_decision` - `anchor` The current module is intentionally thin: it shells out to a caller-provided local `ethos` CLI @@ -128,6 +129,33 @@ The summary is not a replacement for the canonical verification report. It deter derives `proof_status`, `request_certified`, reusable grounded check ids, needs-review check ids, and proof limitations from the report that `ethos verify` already emitted. +Use `app_answer_release_decision(...)` when an application has already labeled claim relevance and +synthesis, and wants the conservative release policy from `docs/app-answer-release-contract.md`: + +```python +from ethos_pdf import app_answer_release_decision, proof_summary + +summary = proof_summary(report) +decision = app_answer_release_decision( + "What was Q3 2025 revenue?", + summary, + [ + { + "id": "claim-revenue", + "text": "Revenue grew to $12.4M in Q3 2025.", + "check_id": "v0001", + "question_relevance": "direct_answer", + "claim_type": "source_fact", + } + ], +) +print(decision["app_status"]) +``` + +The helper does not judge relevance or synthesis. Callers supply those labels; the helper applies +the release rule and requires referenced Ethos check IDs to be reusable before a claim can enter +the final answer. + Run the focused tests with: ```sh diff --git a/python/ethos_pdf/__init__.py b/python/ethos_pdf/__init__.py index 97e23cf..fcfea07 100644 --- a/python/ethos_pdf/__init__.py +++ b/python/ethos_pdf/__init__.py @@ -28,6 +28,7 @@ ParseTimeoutError, PdfiumNotFoundError, anchor, + app_answer_release_decision, crop_element, parse_pdf_json, parse_pdf_markdown, @@ -50,6 +51,7 @@ "ParseTimeoutError", "PdfiumNotFoundError", "anchor", + "app_answer_release_decision", "crop_element", "parse_pdf_json", "parse_pdf_markdown", diff --git a/python/ethos_pdf/_cli.py b/python/ethos_pdf/_cli.py index 39b8138..e115075 100644 --- a/python/ethos_pdf/_cli.py +++ b/python/ethos_pdf/_cli.py @@ -28,6 +28,20 @@ _DEFAULT_CROP_CHECK_ID = "v0001" _CAPABILITY_LIMITED = "capability_limited" _GROUNDED = "grounded" +_ANSWER_RELEVANT = frozenset(("direct_answer", "supports_answer")) +_ANSWER_IRRELEVANT = frozenset(("background_only", "unrelated")) +_QUESTION_RELEVANCE = _ANSWER_RELEVANT | _ANSWER_IRRELEVANT +_CLAIM_TYPES = frozenset(("source_fact", "synthesis", "unsupported")) +_PROOF_STATUSES = frozenset(("verified", "partially_verified", "unverified")) +_PROOF_LIMITATIONS = frozenset( + ( + "capability_limited", + "stale_fingerprint", + "unsupported_claim_kind", + "non_grounded_checks", + "semantic_unverified", + ) +) class EthosPythonSurfaceError(Exception): @@ -683,6 +697,67 @@ def proof_summary(report: Mapping[str, Any]) -> Dict[str, Any]: } +def app_answer_release_decision( + question: str, + proof: Mapping[str, Any], + claims: Sequence[Mapping[str, Any]], + *, + verification_report_ref: str = "verification_report.json", + notes: Optional[Sequence[str]] = None, +) -> Dict[str, Any]: + """Build a non-canonical app answer release decision envelope. + + The caller owns `question_relevance` and `claim_type` labels. This helper + only applies the app-release policy from + `docs/app-answer-release-contract.md` against an Ethos proof summary or a + canonical verification report. It never judges relevance or synthesis by + itself. + """ + + if not isinstance(question, str) or not question.strip(): + raise ValueError("question must be a non-empty string") + if not isinstance(verification_report_ref, str) or not verification_report_ref: + raise ValueError("verification_report_ref must be a non-empty string") + if not isinstance(claims, Sequence) or isinstance(claims, (str, bytes)): + raise ValueError("claims must be a sequence of mappings") + + grounding = _coerce_proof_summary(proof) + reusable = set(grounding["reusable_grounded_check_ids"]) + needs_review = set(grounding["needs_review_check_ids"]) + + decisions = [ + _app_claim_decision(claim, reusable, needs_review) + for claim in claims + ] + final_ids = [ + claim["id"] for claim in decisions if claim["release_action"] == "show_final" + ] + review_ids = [ + claim["id"] for claim in decisions if claim["release_action"] == "needs_review" + ] + blocked_ids = [ + claim["id"] for claim in decisions if claim["release_action"] == "block" + ] + + envelope: Dict[str, Any] = { + "artifact_type": "ethos.app_answer_release_decision.v1", + "schema_version": "1.0.0", + "question": question, + "grounding": { + "verification_report_ref": verification_report_ref, + **grounding, + }, + "app_status": _app_status(decisions), + "claims": decisions, + "final_answer_claim_ids": final_ids, + "review_claim_ids": review_ids, + "blocked_claim_ids": blocked_ids, + } + if notes is not None: + envelope["notes"] = _string_list(notes, "notes") + return envelope + + def _validate_timeout_seconds(timeout_seconds: Optional[float]) -> None: if timeout_seconds is not None and timeout_seconds <= 0: raise ValueError("timeout_seconds must be greater than zero when provided") @@ -735,6 +810,194 @@ def _has_capability_limit( ) +def _coerce_proof_summary(proof: Mapping[str, Any]) -> Dict[str, Any]: + if not isinstance(proof, Mapping): + raise ValueError("proof must be a verification report or proof summary mapping") + if "proof_status" not in proof and "all_evidence_grounded" in proof: + proof = proof_summary(proof) + + status = proof.get("proof_status") + if status not in _PROOF_STATUSES: + raise ValueError("proof_status must be verified, partially_verified, or unverified") + request_certified = proof.get("request_certified") + if not isinstance(request_certified, bool): + raise ValueError("request_certified must be a boolean") + reusable = _string_list( + proof.get("reusable_grounded_check_ids"), + "reusable_grounded_check_ids", + ) + needs_review = _string_list( + proof.get("needs_review_check_ids"), + "needs_review_check_ids", + ) + limitations = _string_list(proof.get("proof_limitations"), "proof_limitations") + unknown_limitations = [ + limitation for limitation in limitations if limitation not in _PROOF_LIMITATIONS + ] + if unknown_limitations: + raise ValueError(f"unknown proof_limitations: {', '.join(unknown_limitations)}") + return { + "proof_status": status, + "request_certified": request_certified, + "reusable_grounded_check_ids": reusable, + "needs_review_check_ids": needs_review, + "proof_limitations": limitations, + } + + +def _app_claim_decision( + claim: Mapping[str, Any], + reusable_check_ids: set[str], + needs_review_check_ids: set[str], +) -> Dict[str, Any]: + if not isinstance(claim, Mapping): + raise ValueError("each claim must be a mapping") + + claim_id = _required_string(claim, "id") + text = _required_string(claim, "text") + relevance = _required_string(claim, "question_relevance") + if relevance not in _QUESTION_RELEVANCE: + raise ValueError(f"unknown question_relevance for claim {claim_id}: {relevance}") + claim_type = _required_string(claim, "claim_type") + if claim_type not in _CLAIM_TYPES: + raise ValueError(f"unknown claim_type for claim {claim_id}: {claim_type}") + + check_ids = _claim_check_ids(claim, claim_id) + citation_grounded = _claim_citation_grounded( + claim, + claim_id, + check_ids, + reusable_check_ids, + needs_review_check_ids, + ) + if claim_type == "unsupported" and citation_grounded: + raise ValueError(f"unsupported claim {claim_id} cannot be citation_grounded") + + release_action, release_reason = _release_decision( + citation_grounded, + relevance, + claim_type, + ) + decision: Dict[str, Any] = { + "id": claim_id, + "text": text, + "citation_grounded": citation_grounded, + "question_relevance": relevance, + "claim_type": claim_type, + "release_action": release_action, + "release_reason": release_reason, + } + if "check_id" in claim: + decision["check_id"] = check_ids[0] + elif "check_ids" in claim: + decision["check_ids"] = check_ids + return decision + + +def _required_string(value: Mapping[str, Any], field: str) -> str: + item = value.get(field) + if not isinstance(item, str) or not item: + raise ValueError(f"{field} must be a non-empty string") + return item + + +def _string_list(value: Any, field: str) -> list[str]: + if not isinstance(value, (list, tuple)): + raise ValueError(f"{field} must be a list of strings") + result = [] + for item in value: + if not isinstance(item, str) or not item: + raise ValueError(f"{field} must be a list of non-empty strings") + result.append(item) + if len(set(result)) != len(result): + raise ValueError(f"{field} must not contain duplicates") + return result + + +def _claim_check_ids(claim: Mapping[str, Any], claim_id: str) -> list[str]: + has_check_id = "check_id" in claim + has_check_ids = "check_ids" in claim + if has_check_id and has_check_ids: + raise ValueError(f"claim {claim_id} must use either check_id or check_ids, not both") + if has_check_id: + return [_required_string(claim, "check_id")] + if has_check_ids: + check_ids = _string_list(claim.get("check_ids"), "check_ids") + if not check_ids: + raise ValueError(f"claim {claim_id} check_ids must not be empty") + return check_ids + return [] + + +def _claim_citation_grounded( + claim: Mapping[str, Any], + claim_id: str, + check_ids: Sequence[str], + reusable_check_ids: set[str], + needs_review_check_ids: set[str], +) -> bool: + provided = claim.get("citation_grounded") + if provided is not None and not isinstance(provided, bool): + raise ValueError(f"citation_grounded for claim {claim_id} must be a boolean") + + if check_ids: + known = reusable_check_ids | needs_review_check_ids + unknown = [check_id for check_id in check_ids if check_id not in known] + if unknown: + raise ValueError( + f"claim {claim_id} references unknown check ids: {', '.join(unknown)}" + ) + computed = all(check_id in reusable_check_ids for check_id in check_ids) + if provided is not None and provided != computed: + raise ValueError( + f"citation_grounded for claim {claim_id} conflicts with proof summary" + ) + return computed + + if provided is None: + raise ValueError( + f"claim {claim_id} without check_id/check_ids must set citation_grounded" + ) + return provided + + +def _release_decision( + citation_grounded: bool, + relevance: str, + claim_type: str, +) -> tuple[str, str]: + if ( + citation_grounded + and relevance in _ANSWER_RELEVANT + and claim_type == "source_fact" + ): + return "show_final", "certified" + if ( + citation_grounded + and relevance in _ANSWER_RELEVANT + and claim_type == "synthesis" + ): + return "needs_review", "supported_synthesis_needs_review" + if citation_grounded and relevance in _ANSWER_IRRELEVANT: + return "block", "grounded_but_irrelevant" + return "block", "cannot_answer_from_sources" + + +def _app_status(claims: Sequence[Mapping[str, Any]]) -> str: + has_final = any(claim["release_action"] == "show_final" for claim in claims) + has_review = any(claim["release_action"] == "needs_review" for claim in claims) + has_blocked = any(claim["release_action"] == "block" for claim in claims) + if has_final and not has_review and not has_blocked: + return "certified" + if has_final: + return "partial_certified" + if has_review: + return "supported_synthesis_needs_review" + if any(claim["release_reason"] == "grounded_but_irrelevant" for claim in claims): + return "grounded_but_irrelevant" + return "cannot_answer_from_sources" + + def _list_value(value: Any) -> Sequence[Any]: if isinstance(value, list): return value diff --git a/python/tests/test_cli_surface.py b/python/tests/test_cli_surface.py index 1495e0f..fee256d 100644 --- a/python/tests/test_cli_surface.py +++ b/python/tests/test_cli_surface.py @@ -34,6 +34,7 @@ ParseTimeoutError, PdfiumNotFoundError, anchor, + app_answer_release_decision, crop_element, parse_pdf_json, proof_summary, @@ -576,6 +577,169 @@ def test_proof_summary_excludes_stale_and_semantic_grounded_checks(self) -> None self.assertEqual(semantic["needs_review_check_ids"], ["v0001"]) self.assertEqual(semantic["proof_limitations"], ["semantic_unverified"]) + def test_app_answer_release_decision_applies_relevance_and_synthesis_policy( + self, + ) -> None: + summary = { + "proof_status": "partially_verified", + "request_certified": False, + "reusable_grounded_check_ids": ["v0001", "v0002", "v0003"], + "needs_review_check_ids": ["v0004"], + "proof_limitations": ["non_grounded_checks"], + } + + result = app_answer_release_decision( + "What was Q3 2025 revenue?", + summary, + [ + { + "id": "claim-revenue", + "text": "Revenue grew to $12.4M in Q3 2025.", + "check_id": "v0001", + "question_relevance": "direct_answer", + "claim_type": "source_fact", + }, + { + "id": "claim-background", + "text": "The company opened a European office.", + "check_id": "v0002", + "question_relevance": "background_only", + "claim_type": "source_fact", + }, + { + "id": "claim-synthesis", + "text": "Revenue growth was likely driven by enterprise expansion.", + "check_ids": ["v0001", "v0003"], + "question_relevance": "supports_answer", + "claim_type": "synthesis", + }, + { + "id": "claim-margin", + "text": "Gross margin improved in Q3 2025.", + "check_id": "v0004", + "question_relevance": "direct_answer", + "claim_type": "unsupported", + }, + ], + verification_report_ref="reports/q3-verification.json", + notes=["application-owned relevance labels"], + ) + + self.assertEqual( + result["artifact_type"], + "ethos.app_answer_release_decision.v1", + ) + self.assertEqual(result["app_status"], "partial_certified") + self.assertEqual(result["grounding"]["verification_report_ref"], "reports/q3-verification.json") + self.assertEqual(result["final_answer_claim_ids"], ["claim-revenue"]) + self.assertEqual(result["review_claim_ids"], ["claim-synthesis"]) + self.assertEqual(result["blocked_claim_ids"], ["claim-background", "claim-margin"]) + self.assertEqual(result["claims"][0]["release_reason"], "certified") + self.assertTrue(result["claims"][0]["citation_grounded"]) + self.assertEqual(result["claims"][1]["release_reason"], "grounded_but_irrelevant") + self.assertEqual(result["claims"][2]["release_action"], "needs_review") + self.assertEqual(result["claims"][2]["check_ids"], ["v0001", "v0003"]) + self.assertFalse(result["claims"][3]["citation_grounded"]) + self.assertEqual(result["claims"][3]["release_reason"], "cannot_answer_from_sources") + self.assertEqual(result["notes"], ["application-owned relevance labels"]) + + def test_app_answer_release_decision_accepts_verification_report(self) -> None: + report = { + "all_evidence_grounded": True, + "fingerprint_stale": False, + "capability_limits": [], + "unsupported_claim_kinds": [], + "warnings": [], + "checks": [ + { + "id": "v0001", + "status": "grounded", + "semantic_unverified": False, + "warnings": [], + } + ], + } + + result = app_answer_release_decision( + "What was Q3 2025 revenue?", + report, + [ + { + "id": "claim-revenue", + "text": "Revenue grew to $12.4M in Q3 2025.", + "check_id": "v0001", + "question_relevance": "direct_answer", + "claim_type": "source_fact", + } + ], + ) + + self.assertEqual(result["app_status"], "certified") + self.assertEqual(result["grounding"]["proof_status"], "verified") + self.assertEqual(result["final_answer_claim_ids"], ["claim-revenue"]) + + def test_app_answer_release_decision_blocks_empty_source_answer(self) -> None: + summary = { + "proof_status": "unverified", + "request_certified": False, + "reusable_grounded_check_ids": [], + "needs_review_check_ids": [], + "proof_limitations": [], + } + + result = app_answer_release_decision( + "What was Q3 2025 revenue?", + summary, + [], + ) + + self.assertEqual(result["app_status"], "cannot_answer_from_sources") + self.assertEqual(result["final_answer_claim_ids"], []) + self.assertEqual(result["review_claim_ids"], []) + self.assertEqual(result["blocked_claim_ids"], []) + + def test_app_answer_release_decision_rejects_inconsistent_or_unknown_checks( + self, + ) -> None: + summary = { + "proof_status": "partially_verified", + "request_certified": False, + "reusable_grounded_check_ids": ["v0001"], + "needs_review_check_ids": ["v0002"], + "proof_limitations": ["non_grounded_checks"], + } + + with self.assertRaises(ValueError): + app_answer_release_decision( + "What was Q3 2025 revenue?", + summary, + [ + { + "id": "claim-bad", + "text": "Revenue grew.", + "check_id": "v0001", + "citation_grounded": False, + "question_relevance": "direct_answer", + "claim_type": "source_fact", + } + ], + ) + + with self.assertRaises(ValueError): + app_answer_release_decision( + "What was Q3 2025 revenue?", + summary, + [ + { + "id": "claim-unknown", + "text": "Revenue grew.", + "check_id": "v9999", + "question_relevance": "direct_answer", + "claim_type": "source_fact", + } + ], + ) + def test_anchor_maps_source_evidence_refs_and_grounding(self) -> None: result = anchor( self.document, diff --git a/schemas/ethos-app-answer-release-decision.schema.json b/schemas/ethos-app-answer-release-decision.schema.json index 9ae31b7..d876baa 100644 --- a/schemas/ethos-app-answer-release-decision.schema.json +++ b/schemas/ethos-app-answer-release-decision.schema.json @@ -144,6 +144,13 @@ "minLength": 1, "description": "Ethos verification check ID when the claim came from a verification report check." }, + "check_ids": { + "type": "array", + "items": { "type": "string", "minLength": 1 }, + "minItems": 1, + "uniqueItems": true, + "description": "Multiple Ethos verification check IDs used by a synthesis claim." + }, "citation_grounded": { "type": "boolean" }, "question_relevance": { "$ref": "#/$defs/question_relevance" }, "claim_type": { "$ref": "#/$defs/claim_type" }, @@ -151,6 +158,11 @@ "release_reason": { "$ref": "#/$defs/release_reason" } }, "allOf": [ + { + "not": { + "required": ["check_id", "check_ids"] + } + }, { "if": { "properties": { "release_reason": { "const": "certified" } }, diff --git a/schemas/examples/app-answer-release-decision.example.json b/schemas/examples/app-answer-release-decision.example.json index a238729..4c7cb75 100644 --- a/schemas/examples/app-answer-release-decision.example.json +++ b/schemas/examples/app-answer-release-decision.example.json @@ -35,7 +35,7 @@ { "id": "claim-growth-driver", "text": "Q3 revenue growth was likely driven by enterprise expansion.", - "check_id": "v0003", + "check_ids": ["v0001", "v0003"], "citation_grounded": true, "question_relevance": "supports_answer", "claim_type": "synthesis", From 1f4165dde4ab555539e2ccd4efb0631c8acfec50 Mon Sep 17 00:00:00 2001 From: docushell-admin Date: Tue, 30 Jun 2026 15:22:16 +0530 Subject: [PATCH 04/10] Add Rust app answer release helper Signed-off-by: docushell-admin --- .../test_app_answer_release_contract.py | 3 +- Makefile | 1 + crates/ethos-core/README.md | 5 + crates/ethos-core/src/verify_types.rs | 675 ++++++++++++++++++ docs/app-answer-release-contract.md | 5 +- ...os-app-answer-release-decision.schema.json | 2 +- 6 files changed, 687 insertions(+), 4 deletions(-) diff --git a/.github/scripts/test_app_answer_release_contract.py b/.github/scripts/test_app_answer_release_contract.py index cdbbcd2..2e6cf87 100644 --- a/.github/scripts/test_app_answer_release_contract.py +++ b/.github/scripts/test_app_answer_release_contract.py @@ -35,6 +35,7 @@ SPEC = ROOT / "SPEC.md" EXPECTED_TARGET_COMMANDS = [ + "cargo test --locked -p ethos-doc-core --no-default-features --features verify-types app_answer_release", "$(PYTHON) schemas/validate_examples.py", "$(PYTHON) .github/scripts/test_app_answer_release_contract.py", "$(PYTHON) .github/scripts/claims_gate.py", @@ -169,6 +170,7 @@ def test_docs_link_schema_and_keep_boundary_explicit(self) -> None: self.assertIn("`schemas/examples/app-answer-release-decision.example.json`", text) self.assertIn("not a replacement for `verification_report.json`", normalized) self.assertIn("`app_answer_release_decision(...)`", text) + self.assertIn("`derive_app_answer_release_decision(...)`", text) self.assertIn("`check_ids`", text) self.assertIn("Ethos verified citation grounding.", text) for token in EXPECTED_APP_STATUSES + EXPECTED_RELEVANCE + EXPECTED_CLAIM_TYPES: @@ -199,7 +201,6 @@ def test_make_target_composes_scoped_guard(self) -> None: "gh release", "release-candidate-prep", "milestone-e-prep", - "cargo test", "npm", ]: self.assertNotIn(out_of_scope, block) diff --git a/Makefile b/Makefile index 1c9be05..2c8d32a 100644 --- a/Makefile +++ b/Makefile @@ -63,6 +63,7 @@ evidence-anchor-v1-contract: git diff --check app-answer-release-contract: + cargo test --locked -p ethos-doc-core --no-default-features --features verify-types app_answer_release $(PYTHON) schemas/validate_examples.py $(PYTHON) .github/scripts/test_app_answer_release_contract.py $(PYTHON) .github/scripts/claims_gate.py diff --git a/crates/ethos-core/README.md b/crates/ethos-core/README.md index 3b189c9..685acdf 100644 --- a/crates/ethos-core/README.md +++ b/crates/ethos-core/README.md @@ -25,6 +25,11 @@ and the derived `VerificationReport::proof_summary()` helper used by CLI/API wra does not change the canonical JSON report; it deterministically labels whether a request is certified, partially reusable, or unverified. +The same feature also exposes `derive_app_answer_release_decision(...)` for applications that have +already labeled question relevance and synthesis. That helper builds the non-canonical app answer +release envelope documented in `docs/app-answer-release-contract.md`; it does not replace +`verification_report.json` and does not judge relevance itself. + ## Publication Boundary - Public installation from crates.io is available at `0.2.0`. diff --git a/crates/ethos-core/src/verify_types.rs b/crates/ethos-core/src/verify_types.rs index 15c3463..9cb6178 100644 --- a/crates/ethos-core/src/verify_types.rs +++ b/crates/ethos-core/src/verify_types.rs @@ -20,6 +20,10 @@ //! Lives behind `verify-types` (no parser internals) so `ethos-verify` can use these //! without ever seeing the canonical model or backend traits. +use std::collections::HashSet; +use std::error::Error; +use std::fmt; + use serde::{Deserialize, Serialize}; use crate::codes::WarningCode; @@ -383,6 +387,242 @@ pub struct ProofSummary { pub proof_limitations: Vec, } +/// Application-owned relevance label for an answer claim. +/// +/// Ethos does not infer this label. Wrappers supply it before applying answer-release policy. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum AppQuestionRelevance { + /// The grounded evidence directly answers the user question. + DirectAnswer, + /// The grounded evidence supports the answer but is not sufficient alone. + SupportsAnswer, + /// The grounded evidence is true but only background for the question. + BackgroundOnly, + /// The grounded evidence does not support the requested answer. + Unrelated, +} + +impl AppQuestionRelevance { + /// Stable snake_case label used by wrapper envelopes. + pub fn as_str(self) -> &'static str { + match self { + AppQuestionRelevance::DirectAnswer => "direct_answer", + AppQuestionRelevance::SupportsAnswer => "supports_answer", + AppQuestionRelevance::BackgroundOnly => "background_only", + AppQuestionRelevance::Unrelated => "unrelated", + } + } +} + +/// Application-owned source/synthesis label for an answer claim. +/// +/// Ethos does not infer this label. Wrappers supply it before applying answer-release policy. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum AppClaimType { + /// The claim is directly stated by source evidence. + SourceFact, + /// The claim combines multiple grounded facts or adds reasoning across them. + Synthesis, + /// The claim cannot be traced to grounded source evidence. + Unsupported, +} + +impl AppClaimType { + /// Stable snake_case label used by wrapper envelopes. + pub fn as_str(self) -> &'static str { + match self { + AppClaimType::SourceFact => "source_fact", + AppClaimType::Synthesis => "synthesis", + AppClaimType::Unsupported => "unsupported", + } + } +} + +/// Application answer release action for a claim. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum AppReleaseAction { + /// Release the claim in the final answer. + ShowFinal, + /// Keep the claim in a review surface. + NeedsReview, + /// Keep the claim out of the final answer. + Block, +} + +impl AppReleaseAction { + /// Stable snake_case label used by wrapper envelopes. + pub fn as_str(self) -> &'static str { + match self { + AppReleaseAction::ShowFinal => "show_final", + AppReleaseAction::NeedsReview => "needs_review", + AppReleaseAction::Block => "block", + } + } +} + +/// Stable reason for an application answer release action. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum AppReleaseReason { + /// The claim is a grounded relevant source fact and can enter the final answer. + Certified, + /// The claim is grounded and relevant, but is synthesis that needs review. + SupportedSynthesisNeedsReview, + /// The claim is citation-grounded but not relevant to the user question. + GroundedButIrrelevant, + /// The sources do not provide a releasable answer claim. + CannotAnswerFromSources, +} + +impl AppReleaseReason { + /// Stable snake_case label used by wrapper envelopes. + pub fn as_str(self) -> &'static str { + match self { + AppReleaseReason::Certified => "certified", + AppReleaseReason::SupportedSynthesisNeedsReview => "supported_synthesis_needs_review", + AppReleaseReason::GroundedButIrrelevant => "grounded_but_irrelevant", + AppReleaseReason::CannotAnswerFromSources => "cannot_answer_from_sources", + } + } +} + +/// Application-level answer status after applying grounding, relevance, and synthesis policy. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum AppAnswerStatus { + /// Every submitted claim is releasable as a grounded relevant source fact. + Certified, + /// At least one claim is releasable, but some submitted claims are blocked or review-only. + PartialCertified, + /// No final claim is releasable, but grounded relevant synthesis exists for review. + SupportedSynthesisNeedsReview, + /// Grounded claims exist, but they are not relevant to the question. + GroundedButIrrelevant, + /// No relevant grounded source fact is available for final answer release. + CannotAnswerFromSources, +} + +impl AppAnswerStatus { + /// Stable snake_case label used by wrapper envelopes. + pub fn as_str(self) -> &'static str { + match self { + AppAnswerStatus::Certified => "certified", + AppAnswerStatus::PartialCertified => "partial_certified", + AppAnswerStatus::SupportedSynthesisNeedsReview => "supported_synthesis_needs_review", + AppAnswerStatus::GroundedButIrrelevant => "grounded_but_irrelevant", + AppAnswerStatus::CannotAnswerFromSources => "cannot_answer_from_sources", + } + } +} + +/// Caller-supplied claim labels used to derive an application answer release decision. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct AppAnswerClaimInput { + /// Stable application claim id. + pub id: String, + /// Claim text being considered for release. + pub text: String, + /// Ethos verification check ids backing this claim. + pub check_ids: Vec, + /// Optional caller-supplied grounding flag for claims without check ids. + pub citation_grounded: Option, + /// App-owned question relevance label. + pub question_relevance: AppQuestionRelevance, + /// App-owned source/synthesis label. + pub claim_type: AppClaimType, +} + +/// Claim-level decision inside an application answer release envelope. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct AppAnswerClaimDecision { + /// Stable application claim id. + pub id: String, + /// Claim text being considered for release. + pub text: String, + /// Ethos verification check ids backing this claim. + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub check_ids: Vec, + /// Whether the claim's cited Ethos checks are grounded and reusable. + pub citation_grounded: bool, + /// App-owned question relevance label. + pub question_relevance: AppQuestionRelevance, + /// App-owned source/synthesis label. + pub claim_type: AppClaimType, + /// Application release action. + pub release_action: AppReleaseAction, + /// Stable reason for the release action. + pub release_reason: AppReleaseReason, +} + +/// Grounding section embedded in an application answer release decision envelope. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct AppAnswerGrounding { + /// Application-local pointer to the canonical `verification_report.json` used for audit. + pub verification_report_ref: String, + /// Product-facing proof status derived from the canonical report. + pub proof_status: ProofStatus, + /// Whether the request as submitted is certified. + pub request_certified: bool, + /// Check ids that can be reused in downstream final answers. + pub reusable_grounded_check_ids: Vec, + /// Check ids that must not be released without review or repair. + pub needs_review_check_ids: Vec, + /// Limitations that qualify or explain the proof status. + pub proof_limitations: Vec, +} + +/// Non-canonical app answer release decision envelope. +/// +/// This is a wrapper artifact above Ethos grounding. It is not `verification_report.json`. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct AppAnswerReleaseDecision { + /// Artifact type discriminator. + pub artifact_type: String, + /// Schema version for the wrapper envelope. + pub schema_version: String, + /// Original user question evaluated by the app-layer relevance policy. + pub question: String, + /// Derived Ethos proof summary plus the audit report reference. + pub grounding: AppAnswerGrounding, + /// Application-level answer status. + pub app_status: AppAnswerStatus, + /// Claim-level release decisions. + pub claims: Vec, + /// Claim ids that may enter the final answer. + pub final_answer_claim_ids: Vec, + /// Claim ids that should be held for review. + pub review_claim_ids: Vec, + /// Claim ids that should be blocked from the final answer. + pub blocked_claim_ids: Vec, + /// Optional application notes. + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub notes: Vec, +} + +/// Error returned when application answer release inputs are inconsistent. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct AppAnswerReleaseError { + message: String, +} + +impl AppAnswerReleaseError { + /// Human-readable validation message. + pub fn message(&self) -> &str { + &self.message + } +} + +impl fmt::Display for AppAnswerReleaseError { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + f.write_str(&self.message) + } +} + +impl Error for AppAnswerReleaseError {} + impl VerificationReport { /// Derive a product-facing proof summary from this canonical report. pub fn proof_summary(&self) -> ProofSummary { @@ -448,6 +688,102 @@ pub fn is_reusable_grounded_check(report: &VerificationReport, check: &Check) -> !report.fingerprint_stale && check.status == CheckStatus::Grounded && !check.semantic_unverified } +/// Derive a non-canonical application answer release decision from a proof summary. +/// +/// The caller supplies relevance and synthesis labels. This helper only applies the release policy +/// from `docs/app-answer-release-contract.md` and checks that referenced Ethos check ids are known +/// and reusable before a claim enters the final answer. +pub fn derive_app_answer_release_decision( + question: impl Into, + proof: &ProofSummary, + claims: Vec, + verification_report_ref: impl Into, + notes: Vec, +) -> Result { + let question = question.into(); + if question.trim().is_empty() { + return Err(app_answer_release_error( + "question must be a non-empty string", + )); + } + let verification_report_ref = verification_report_ref.into(); + if verification_report_ref.is_empty() { + return Err(app_answer_release_error( + "verification_report_ref must be a non-empty string", + )); + } + for note in ¬es { + if note.is_empty() { + return Err(app_answer_release_error( + "notes must contain only non-empty strings", + )); + } + } + + let reusable_check_ids: HashSet<&str> = proof + .reusable_grounded_check_ids + .iter() + .map(String::as_str) + .collect(); + let needs_review_check_ids: HashSet<&str> = proof + .needs_review_check_ids + .iter() + .map(String::as_str) + .collect(); + let known_check_ids: HashSet<&str> = reusable_check_ids + .union(&needs_review_check_ids) + .copied() + .collect(); + + let mut seen_claim_ids = HashSet::new(); + let mut decisions = Vec::with_capacity(claims.len()); + for claim in claims { + decisions.push(app_answer_claim_decision( + claim, + &reusable_check_ids, + &known_check_ids, + &mut seen_claim_ids, + )?); + } + + let final_answer_claim_ids = decisions + .iter() + .filter(|claim| claim.release_action == AppReleaseAction::ShowFinal) + .map(|claim| claim.id.clone()) + .collect(); + let review_claim_ids = decisions + .iter() + .filter(|claim| claim.release_action == AppReleaseAction::NeedsReview) + .map(|claim| claim.id.clone()) + .collect(); + let blocked_claim_ids = decisions + .iter() + .filter(|claim| claim.release_action == AppReleaseAction::Block) + .map(|claim| claim.id.clone()) + .collect(); + let app_status = app_answer_status(&decisions); + + Ok(AppAnswerReleaseDecision { + artifact_type: "ethos.app_answer_release_decision.v1".to_string(), + schema_version: "1.0.0".to_string(), + question, + grounding: AppAnswerGrounding { + verification_report_ref, + proof_status: proof.proof_status, + request_certified: proof.request_certified, + reusable_grounded_check_ids: proof.reusable_grounded_check_ids.clone(), + needs_review_check_ids: proof.needs_review_check_ids.clone(), + proof_limitations: proof.proof_limitations.clone(), + }, + app_status, + claims: decisions, + final_answer_claim_ids, + review_claim_ids, + blocked_claim_ids, + notes, + }) +} + fn has_capability_limit(report: &VerificationReport) -> bool { !report.capability_limits.is_empty() || report.warnings.contains(&WarningCode::CapabilityLimited) @@ -457,6 +793,170 @@ fn has_capability_limit(report: &VerificationReport) -> bool { .any(|check| check.warnings.contains(&WarningCode::CapabilityLimited)) } +fn app_answer_claim_decision( + claim: AppAnswerClaimInput, + reusable_check_ids: &HashSet<&str>, + known_check_ids: &HashSet<&str>, + seen_claim_ids: &mut HashSet, +) -> Result { + if claim.id.is_empty() { + return Err(app_answer_release_error( + "claim id must be a non-empty string", + )); + } + if !seen_claim_ids.insert(claim.id.clone()) { + return Err(app_answer_release_error(format!( + "duplicate claim id: {}", + claim.id + ))); + } + if claim.text.is_empty() { + return Err(app_answer_release_error(format!( + "claim {} text must be a non-empty string", + claim.id + ))); + } + let mut seen_check_ids = HashSet::new(); + for check_id in &claim.check_ids { + if check_id.is_empty() { + return Err(app_answer_release_error(format!( + "claim {} check_ids must contain only non-empty strings", + claim.id + ))); + } + if !seen_check_ids.insert(check_id.as_str()) { + return Err(app_answer_release_error(format!( + "claim {} has duplicate check id: {}", + claim.id, check_id + ))); + } + } + + let citation_grounded = if claim.check_ids.is_empty() { + claim.citation_grounded.ok_or_else(|| { + app_answer_release_error(format!( + "claim {} without check_ids must set citation_grounded", + claim.id + )) + })? + } else { + let mut computed = true; + for check_id in &claim.check_ids { + if !known_check_ids.contains(check_id.as_str()) { + return Err(app_answer_release_error(format!( + "claim {} references unknown check id: {}", + claim.id, check_id + ))); + } + if !reusable_check_ids.contains(check_id.as_str()) { + computed = false; + } + } + if let Some(provided) = claim.citation_grounded { + if provided != computed { + return Err(app_answer_release_error(format!( + "citation_grounded for claim {} conflicts with proof summary", + claim.id + ))); + } + } + computed + }; + if claim.claim_type == AppClaimType::Unsupported && citation_grounded { + return Err(app_answer_release_error(format!( + "unsupported claim {} cannot be citation_grounded", + claim.id + ))); + } + + let (release_action, release_reason) = app_release_decision( + citation_grounded, + claim.question_relevance, + claim.claim_type, + ); + Ok(AppAnswerClaimDecision { + id: claim.id, + text: claim.text, + check_ids: claim.check_ids, + citation_grounded, + question_relevance: claim.question_relevance, + claim_type: claim.claim_type, + release_action, + release_reason, + }) +} + +fn app_release_decision( + citation_grounded: bool, + question_relevance: AppQuestionRelevance, + claim_type: AppClaimType, +) -> (AppReleaseAction, AppReleaseReason) { + if citation_grounded + && matches!( + question_relevance, + AppQuestionRelevance::DirectAnswer | AppQuestionRelevance::SupportsAnswer + ) + && claim_type == AppClaimType::SourceFact + { + return (AppReleaseAction::ShowFinal, AppReleaseReason::Certified); + } + if citation_grounded + && matches!( + question_relevance, + AppQuestionRelevance::DirectAnswer | AppQuestionRelevance::SupportsAnswer + ) + && claim_type == AppClaimType::Synthesis + { + return ( + AppReleaseAction::NeedsReview, + AppReleaseReason::SupportedSynthesisNeedsReview, + ); + } + if citation_grounded { + return ( + AppReleaseAction::Block, + AppReleaseReason::GroundedButIrrelevant, + ); + } + ( + AppReleaseAction::Block, + AppReleaseReason::CannotAnswerFromSources, + ) +} + +fn app_answer_status(claims: &[AppAnswerClaimDecision]) -> AppAnswerStatus { + let has_final = claims + .iter() + .any(|claim| claim.release_action == AppReleaseAction::ShowFinal); + let has_review = claims + .iter() + .any(|claim| claim.release_action == AppReleaseAction::NeedsReview); + let has_blocked = claims + .iter() + .any(|claim| claim.release_action == AppReleaseAction::Block); + + if has_final && !has_review && !has_blocked { + AppAnswerStatus::Certified + } else if has_final { + AppAnswerStatus::PartialCertified + } else if has_review { + AppAnswerStatus::SupportedSynthesisNeedsReview + } else if claims + .iter() + .any(|claim| claim.release_reason == AppReleaseReason::GroundedButIrrelevant) + { + AppAnswerStatus::GroundedButIrrelevant + } else { + AppAnswerStatus::CannotAnswerFromSources + } +} + +fn app_answer_release_error(message: impl Into) -> AppAnswerReleaseError { + AppAnswerReleaseError { + message: message.into(), + } +} + /// Text normalization modes (config). v1 has exactly these two. #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] #[serde(rename_all = "snake_case")] @@ -743,6 +1243,181 @@ mod tests { ); } + #[test] + fn app_answer_release_decision_applies_relevance_and_synthesis_policy() { + let proof = ProofSummary { + proof_status: ProofStatus::PartiallyVerified, + request_certified: false, + reusable_grounded_check_ids: vec![ + "v0001".to_string(), + "v0002".to_string(), + "v0003".to_string(), + ], + needs_review_check_ids: vec!["v0004".to_string()], + proof_limitations: vec![ProofLimitation::NonGroundedChecks], + }; + + let decision = derive_app_answer_release_decision( + "What was Q3 2025 revenue?", + &proof, + vec![ + AppAnswerClaimInput { + id: "claim-revenue".to_string(), + text: "Revenue grew to $12.4M in Q3 2025.".to_string(), + check_ids: vec!["v0001".to_string()], + citation_grounded: None, + question_relevance: AppQuestionRelevance::DirectAnswer, + claim_type: AppClaimType::SourceFact, + }, + AppAnswerClaimInput { + id: "claim-background".to_string(), + text: "The company opened a European office.".to_string(), + check_ids: vec!["v0002".to_string()], + citation_grounded: None, + question_relevance: AppQuestionRelevance::BackgroundOnly, + claim_type: AppClaimType::SourceFact, + }, + AppAnswerClaimInput { + id: "claim-synthesis".to_string(), + text: "Revenue growth was likely driven by enterprise expansion.".to_string(), + check_ids: vec!["v0001".to_string(), "v0003".to_string()], + citation_grounded: None, + question_relevance: AppQuestionRelevance::SupportsAnswer, + claim_type: AppClaimType::Synthesis, + }, + AppAnswerClaimInput { + id: "claim-margin".to_string(), + text: "Gross margin improved in Q3 2025.".to_string(), + check_ids: vec!["v0004".to_string()], + citation_grounded: None, + question_relevance: AppQuestionRelevance::DirectAnswer, + claim_type: AppClaimType::Unsupported, + }, + ], + "reports/q3-verification.json", + vec!["application-owned relevance labels".to_string()], + ) + .unwrap(); + + assert_eq!( + decision.artifact_type, + "ethos.app_answer_release_decision.v1" + ); + assert_eq!(decision.app_status, AppAnswerStatus::PartialCertified); + assert_eq!( + decision.grounding.verification_report_ref, + "reports/q3-verification.json" + ); + assert_eq!(decision.final_answer_claim_ids, vec!["claim-revenue"]); + assert_eq!(decision.review_claim_ids, vec!["claim-synthesis"]); + assert_eq!( + decision.blocked_claim_ids, + vec!["claim-background", "claim-margin"] + ); + assert_eq!( + decision.claims[0].release_reason, + AppReleaseReason::Certified + ); + assert!(decision.claims[0].citation_grounded); + assert_eq!( + decision.claims[1].release_reason, + AppReleaseReason::GroundedButIrrelevant + ); + assert_eq!( + decision.claims[2].release_action, + AppReleaseAction::NeedsReview + ); + assert_eq!(decision.claims[2].check_ids, vec!["v0001", "v0003"]); + assert!(!decision.claims[3].citation_grounded); + assert_eq!( + decision.claims[3].release_reason, + AppReleaseReason::CannotAnswerFromSources + ); + assert_eq!(decision.notes, vec!["application-owned relevance labels"]); + + let json = serde_json::to_value(&decision).unwrap(); + assert_eq!( + json["artifact_type"], + serde_json::Value::String("ethos.app_answer_release_decision.v1".to_string()) + ); + assert_eq!(json["claims"][2]["check_ids"][1], "v0003"); + } + + #[test] + fn app_answer_release_decision_blocks_empty_source_answer() { + let proof = ProofSummary { + proof_status: ProofStatus::Unverified, + request_certified: false, + reusable_grounded_check_ids: Vec::new(), + needs_review_check_ids: Vec::new(), + proof_limitations: Vec::new(), + }; + + let decision = derive_app_answer_release_decision( + "What was Q3 2025 revenue?", + &proof, + Vec::new(), + "verification_report.json", + Vec::new(), + ) + .unwrap(); + + assert_eq!( + decision.app_status, + AppAnswerStatus::CannotAnswerFromSources + ); + assert!(decision.final_answer_claim_ids.is_empty()); + assert!(decision.review_claim_ids.is_empty()); + assert!(decision.blocked_claim_ids.is_empty()); + } + + #[test] + fn app_answer_release_decision_rejects_conflicting_or_unknown_checks() { + let proof = ProofSummary { + proof_status: ProofStatus::PartiallyVerified, + request_certified: false, + reusable_grounded_check_ids: vec!["v0001".to_string()], + needs_review_check_ids: vec!["v0002".to_string()], + proof_limitations: vec![ProofLimitation::NonGroundedChecks], + }; + + let conflicting = derive_app_answer_release_decision( + "What was Q3 2025 revenue?", + &proof, + vec![AppAnswerClaimInput { + id: "claim-bad".to_string(), + text: "Revenue grew.".to_string(), + check_ids: vec!["v0001".to_string()], + citation_grounded: Some(false), + question_relevance: AppQuestionRelevance::DirectAnswer, + claim_type: AppClaimType::SourceFact, + }], + "verification_report.json", + Vec::new(), + ) + .unwrap_err(); + assert!(conflicting + .message() + .contains("conflicts with proof summary")); + + let unknown = derive_app_answer_release_decision( + "What was Q3 2025 revenue?", + &proof, + vec![AppAnswerClaimInput { + id: "claim-unknown".to_string(), + text: "Revenue grew.".to_string(), + check_ids: vec!["v9999".to_string()], + citation_grounded: None, + question_relevance: AppQuestionRelevance::DirectAnswer, + claim_type: AppClaimType::SourceFact, + }], + "verification_report.json", + Vec::new(), + ) + .unwrap_err(); + assert!(unknown.message().contains("unknown check id: v9999")); + } + #[test] fn report_example_round_trips() { let json = include_str!(concat!( diff --git a/docs/app-answer-release-contract.md b/docs/app-answer-release-contract.md index e622ca2..2cf7822 100644 --- a/docs/app-answer-release-contract.md +++ b/docs/app-answer-release-contract.md @@ -26,8 +26,9 @@ Applications that want a machine-readable wrapper decision envelope may use `schemas/examples/app-answer-release-decision.example.json` shows how to combine a derived Ethos proof summary with app-owned relevance and synthesis labels. This envelope is not a replacement for `verification_report.json`; it records an application release decision above the Ethos grounding -result. Python apps can build the same envelope with `app_answer_release_decision(...)` after they -have supplied relevance and synthesis labels. +result. Rust apps can build the same envelope with `derive_app_answer_release_decision(...)`, and +Python apps can build it with `app_answer_release_decision(...)`, after they have supplied +relevance and synthesis labels. If a wrapper exposes `invalid_request`, that status is a process or API envelope for malformed input, invalid configuration, adapter failure, or usage errors. It is not derived from a diff --git a/schemas/ethos-app-answer-release-decision.schema.json b/schemas/ethos-app-answer-release-decision.schema.json index d876baa..9e0284b 100644 --- a/schemas/ethos-app-answer-release-decision.schema.json +++ b/schemas/ethos-app-answer-release-decision.schema.json @@ -149,7 +149,7 @@ "items": { "type": "string", "minLength": 1 }, "minItems": 1, "uniqueItems": true, - "description": "Multiple Ethos verification check IDs used by a synthesis claim." + "description": "One or more Ethos verification check IDs used by this claim." }, "citation_grounded": { "type": "boolean" }, "question_relevance": { "$ref": "#/$defs/question_relevance" }, From ff78faefc9d498f01fa702079dc7e82c733db1c8 Mon Sep 17 00:00:00 2001 From: docushell-admin Date: Tue, 30 Jun 2026 15:30:11 +0530 Subject: [PATCH 05/10] Cover Python app release helper in contract target Signed-off-by: docushell-admin --- .github/scripts/test_app_answer_release_contract.py | 1 + Makefile | 1 + 2 files changed, 2 insertions(+) diff --git a/.github/scripts/test_app_answer_release_contract.py b/.github/scripts/test_app_answer_release_contract.py index 2e6cf87..6e33592 100644 --- a/.github/scripts/test_app_answer_release_contract.py +++ b/.github/scripts/test_app_answer_release_contract.py @@ -36,6 +36,7 @@ EXPECTED_TARGET_COMMANDS = [ "cargo test --locked -p ethos-doc-core --no-default-features --features verify-types app_answer_release", + "$(MAKE) python-surface-test PYTHON=$(PYTHON)", "$(PYTHON) schemas/validate_examples.py", "$(PYTHON) .github/scripts/test_app_answer_release_contract.py", "$(PYTHON) .github/scripts/claims_gate.py", diff --git a/Makefile b/Makefile index 2c8d32a..5263acb 100644 --- a/Makefile +++ b/Makefile @@ -64,6 +64,7 @@ evidence-anchor-v1-contract: app-answer-release-contract: cargo test --locked -p ethos-doc-core --no-default-features --features verify-types app_answer_release + $(MAKE) python-surface-test PYTHON=$(PYTHON) $(PYTHON) schemas/validate_examples.py $(PYTHON) .github/scripts/test_app_answer_release_contract.py $(PYTHON) .github/scripts/claims_gate.py From 3bb1e8e9c73c3312af17cd8ac7745e46d12d218d Mon Sep 17 00:00:00 2001 From: docushell-admin Date: Tue, 30 Jun 2026 15:33:55 +0530 Subject: [PATCH 06/10] Normalize app release check id output Signed-off-by: docushell-admin --- .../test_app_answer_release_contract.py | 43 +++++++++++++++++++ python/README.md | 2 +- python/ethos_pdf/_cli.py | 4 +- python/tests/test_cli_surface.py | 13 +++--- ...os-app-answer-release-decision.schema.json | 10 ----- .../app-answer-release-decision.example.json | 6 +-- 6 files changed, 55 insertions(+), 23 deletions(-) diff --git a/.github/scripts/test_app_answer_release_contract.py b/.github/scripts/test_app_answer_release_contract.py index 6e33592..3de73fd 100644 --- a/.github/scripts/test_app_answer_release_contract.py +++ b/.github/scripts/test_app_answer_release_contract.py @@ -18,6 +18,7 @@ from __future__ import annotations import json +import sys import unittest from pathlib import Path @@ -33,6 +34,7 @@ SCHEMAS_README = ROOT / "schemas/README.md" README = ROOT / "README.md" SPEC = ROOT / "SPEC.md" +PYTHON_PACKAGE = ROOT / "python" EXPECTED_TARGET_COMMANDS = [ "cargo test --locked -p ethos-doc-core --no-default-features --features verify-types app_answer_release", @@ -106,6 +108,9 @@ def test_schema_vocabulary_matches_contract(self) -> None: self.assertEqual(EXPECTED_CLAIM_TYPES, schema_enum("claim_type")) self.assertEqual(EXPECTED_RELEASE_ACTIONS, schema_enum("release_action")) self.assertEqual(EXPECTED_RELEASE_REASONS, schema_enum("release_reason")) + claim_properties = load_json(SCHEMA)["$defs"]["claim_decision"]["properties"] + self.assertNotIn("check_id", claim_properties) + self.assertIn("check_ids", claim_properties) def test_example_covers_relevance_and_synthesis_failure_modes(self) -> None: example = load_json(EXAMPLE) @@ -114,8 +119,12 @@ def test_example_covers_relevance_and_synthesis_failure_modes(self) -> None: self.assertEqual("partial_certified", example["app_status"]) self.assertEqual("partially_verified", example["grounding"]["proof_status"]) + for claim in claims.values(): + self.assertNotIn("check_id", claim) + irrelevant = claims["claim-office-background"] self.assertTrue(irrelevant["citation_grounded"]) + self.assertEqual(["v0002"], irrelevant["check_ids"]) self.assertEqual("background_only", irrelevant["question_relevance"]) self.assertEqual("block", irrelevant["release_action"]) self.assertEqual("grounded_but_irrelevant", irrelevant["release_reason"]) @@ -150,6 +159,40 @@ def test_final_answer_ids_are_only_certified_source_facts(self) -> None: for claim_id in example["blocked_claim_ids"]: self.assertEqual("block", claims[claim_id]["release_action"]) + def test_python_helper_emits_schema_conformant_decision(self) -> None: + if str(PYTHON_PACKAGE) not in sys.path: + sys.path.insert(0, str(PYTHON_PACKAGE)) + from ethos_pdf import app_answer_release_decision + + schema = load_json(SCHEMA) + decision = app_answer_release_decision( + "What was Q3 2025 revenue?", + { + "proof_status": "verified", + "request_certified": True, + "reusable_grounded_check_ids": ["v0001"], + "needs_review_check_ids": [], + "proof_limitations": [], + }, + [ + { + "id": "claim-revenue", + "text": "Revenue grew to $12.4M in Q3 2025.", + "check_id": "v0001", + "question_relevance": "direct_answer", + "claim_type": "source_fact", + } + ], + ) + errors = sorted( + Draft202012Validator(schema).iter_errors(decision), + key=lambda error: list(error.absolute_path), + ) + + self.assertEqual([], errors) + self.assertNotIn("check_id", decision["claims"][0]) + self.assertEqual(["v0001"], decision["claims"][0]["check_ids"]) + def test_schema_registry_validates_example(self) -> None: text = VALIDATE_EXAMPLES.read_text(encoding="utf-8") diff --git a/python/README.md b/python/README.md index 8114687..60fc50c 100644 --- a/python/README.md +++ b/python/README.md @@ -143,7 +143,7 @@ decision = app_answer_release_decision( { "id": "claim-revenue", "text": "Revenue grew to $12.4M in Q3 2025.", - "check_id": "v0001", + "check_ids": ["v0001"], "question_relevance": "direct_answer", "claim_type": "source_fact", } diff --git a/python/ethos_pdf/_cli.py b/python/ethos_pdf/_cli.py index e115075..5c0bdb4 100644 --- a/python/ethos_pdf/_cli.py +++ b/python/ethos_pdf/_cli.py @@ -887,9 +887,7 @@ def _app_claim_decision( "release_action": release_action, "release_reason": release_reason, } - if "check_id" in claim: - decision["check_id"] = check_ids[0] - elif "check_ids" in claim: + if check_ids: decision["check_ids"] = check_ids return decision diff --git a/python/tests/test_cli_surface.py b/python/tests/test_cli_surface.py index fee256d..fd6d634 100644 --- a/python/tests/test_cli_surface.py +++ b/python/tests/test_cli_surface.py @@ -595,14 +595,14 @@ def test_app_answer_release_decision_applies_relevance_and_synthesis_policy( { "id": "claim-revenue", "text": "Revenue grew to $12.4M in Q3 2025.", - "check_id": "v0001", + "check_ids": ["v0001"], "question_relevance": "direct_answer", "claim_type": "source_fact", }, { "id": "claim-background", "text": "The company opened a European office.", - "check_id": "v0002", + "check_ids": ["v0002"], "question_relevance": "background_only", "claim_type": "source_fact", }, @@ -616,7 +616,7 @@ def test_app_answer_release_decision_applies_relevance_and_synthesis_policy( { "id": "claim-margin", "text": "Gross margin improved in Q3 2025.", - "check_id": "v0004", + "check_ids": ["v0004"], "question_relevance": "direct_answer", "claim_type": "unsupported", }, @@ -635,6 +635,7 @@ def test_app_answer_release_decision_applies_relevance_and_synthesis_policy( self.assertEqual(result["review_claim_ids"], ["claim-synthesis"]) self.assertEqual(result["blocked_claim_ids"], ["claim-background", "claim-margin"]) self.assertEqual(result["claims"][0]["release_reason"], "certified") + self.assertEqual(result["claims"][0]["check_ids"], ["v0001"]) self.assertTrue(result["claims"][0]["citation_grounded"]) self.assertEqual(result["claims"][1]["release_reason"], "grounded_but_irrelevant") self.assertEqual(result["claims"][2]["release_action"], "needs_review") @@ -667,7 +668,7 @@ def test_app_answer_release_decision_accepts_verification_report(self) -> None: { "id": "claim-revenue", "text": "Revenue grew to $12.4M in Q3 2025.", - "check_id": "v0001", + "check_ids": ["v0001"], "question_relevance": "direct_answer", "claim_type": "source_fact", } @@ -717,7 +718,7 @@ def test_app_answer_release_decision_rejects_inconsistent_or_unknown_checks( { "id": "claim-bad", "text": "Revenue grew.", - "check_id": "v0001", + "check_ids": ["v0001"], "citation_grounded": False, "question_relevance": "direct_answer", "claim_type": "source_fact", @@ -733,7 +734,7 @@ def test_app_answer_release_decision_rejects_inconsistent_or_unknown_checks( { "id": "claim-unknown", "text": "Revenue grew.", - "check_id": "v9999", + "check_ids": ["v9999"], "question_relevance": "direct_answer", "claim_type": "source_fact", } diff --git a/schemas/ethos-app-answer-release-decision.schema.json b/schemas/ethos-app-answer-release-decision.schema.json index 9e0284b..f4e421f 100644 --- a/schemas/ethos-app-answer-release-decision.schema.json +++ b/schemas/ethos-app-answer-release-decision.schema.json @@ -139,11 +139,6 @@ "properties": { "id": { "type": "string", "minLength": 1 }, "text": { "type": "string", "minLength": 1 }, - "check_id": { - "type": "string", - "minLength": 1, - "description": "Ethos verification check ID when the claim came from a verification report check." - }, "check_ids": { "type": "array", "items": { "type": "string", "minLength": 1 }, @@ -158,11 +153,6 @@ "release_reason": { "$ref": "#/$defs/release_reason" } }, "allOf": [ - { - "not": { - "required": ["check_id", "check_ids"] - } - }, { "if": { "properties": { "release_reason": { "const": "certified" } }, diff --git a/schemas/examples/app-answer-release-decision.example.json b/schemas/examples/app-answer-release-decision.example.json index 4c7cb75..2b35115 100644 --- a/schemas/examples/app-answer-release-decision.example.json +++ b/schemas/examples/app-answer-release-decision.example.json @@ -15,7 +15,7 @@ { "id": "claim-revenue", "text": "Revenue grew to $12.4M in Q3 2025.", - "check_id": "v0001", + "check_ids": ["v0001"], "citation_grounded": true, "question_relevance": "direct_answer", "claim_type": "source_fact", @@ -25,7 +25,7 @@ { "id": "claim-office-background", "text": "The company opened a European office.", - "check_id": "v0002", + "check_ids": ["v0002"], "citation_grounded": true, "question_relevance": "background_only", "claim_type": "source_fact", @@ -45,7 +45,7 @@ { "id": "claim-margin", "text": "Gross margin improved in Q3 2025.", - "check_id": "v0004", + "check_ids": ["v0004"], "citation_grounded": false, "question_relevance": "direct_answer", "claim_type": "unsupported", From 93547324e75cfbbce47282a563345b1106fea778 Mon Sep 17 00:00:00 2001 From: docushell-admin Date: Tue, 30 Jun 2026 15:41:02 +0530 Subject: [PATCH 07/10] Tighten app release schema invariants Signed-off-by: docushell-admin --- .../test_app_answer_release_contract.py | 33 +++++++++++++++++++ ...os-app-answer-release-decision.schema.json | 20 ++++++++--- 2 files changed, 48 insertions(+), 5 deletions(-) diff --git a/.github/scripts/test_app_answer_release_contract.py b/.github/scripts/test_app_answer_release_contract.py index 3de73fd..0d21baf 100644 --- a/.github/scripts/test_app_answer_release_contract.py +++ b/.github/scripts/test_app_answer_release_contract.py @@ -193,6 +193,39 @@ def test_python_helper_emits_schema_conformant_decision(self) -> None: self.assertNotIn("check_id", decision["claims"][0]) self.assertEqual(["v0001"], decision["claims"][0]["check_ids"]) + def test_schema_rejects_grounded_unsupported_claims(self) -> None: + schema = load_json(SCHEMA) + example = load_json(EXAMPLE) + claim = example["claims"][-1] + claim["citation_grounded"] = True + + errors = list(Draft202012Validator(schema).iter_errors(example)) + + self.assertTrue( + any( + list(error.absolute_path) == ["claims", 3, "citation_grounded"] + for error in errors + ), + errors, + ) + + def test_schema_requires_cannot_answer_claims_to_be_ungrounded(self) -> None: + schema = load_json(SCHEMA) + example = load_json(EXAMPLE) + claim = example["claims"][0] + claim["release_action"] = "block" + claim["release_reason"] = "cannot_answer_from_sources" + + errors = list(Draft202012Validator(schema).iter_errors(example)) + + self.assertTrue( + any( + list(error.absolute_path) == ["claims", 0, "citation_grounded"] + for error in errors + ), + errors, + ) + def test_schema_registry_validates_example(self) -> None: text = VALIDATE_EXAMPLES.read_text(encoding="utf-8") diff --git a/schemas/ethos-app-answer-release-decision.schema.json b/schemas/ethos-app-answer-release-decision.schema.json index f4e421f..87f29b1 100644 --- a/schemas/ethos-app-answer-release-decision.schema.json +++ b/schemas/ethos-app-answer-release-decision.schema.json @@ -196,6 +196,19 @@ } } }, + { + "if": { + "properties": { "claim_type": { "const": "unsupported" } }, + "required": ["claim_type"] + }, + "then": { + "properties": { + "citation_grounded": { "const": false }, + "release_action": { "const": "block" }, + "release_reason": { "const": "cannot_answer_from_sources" } + } + } + }, { "if": { "properties": { "release_reason": { "const": "cannot_answer_from_sources" } }, @@ -203,12 +216,9 @@ }, "then": { "properties": { + "citation_grounded": { "const": false }, "release_action": { "const": "block" } - }, - "anyOf": [ - { "properties": { "citation_grounded": { "const": false } } }, - { "properties": { "claim_type": { "const": "unsupported" } } } - ] + } } } ] From 9e1e7347e429ebdead74e50663c0f4478bfecf29 Mon Sep 17 00:00:00 2001 From: docushell-admin Date: Tue, 30 Jun 2026 15:47:15 +0530 Subject: [PATCH 08/10] Reject duplicate app release claim ids Signed-off-by: docushell-admin --- .../test_app_answer_release_contract.py | 13 ++++++ crates/ethos-core/README.md | 3 +- crates/ethos-core/src/verify_types.rs | 41 +++++++++++++++++++ docs/app-answer-release-contract.md | 3 ++ python/README.md | 3 +- python/ethos_pdf/_cli.py | 13 ++++-- python/tests/test_cli_surface.py | 31 ++++++++++++++ ...os-app-answer-release-decision.schema.json | 10 ++++- 8 files changed, 109 insertions(+), 8 deletions(-) diff --git a/.github/scripts/test_app_answer_release_contract.py b/.github/scripts/test_app_answer_release_contract.py index 0d21baf..3c90732 100644 --- a/.github/scripts/test_app_answer_release_contract.py +++ b/.github/scripts/test_app_answer_release_contract.py @@ -141,6 +141,19 @@ def test_example_covers_relevance_and_synthesis_failure_modes(self) -> None: self.assertEqual("unsupported", unsupported["claim_type"]) self.assertEqual("cannot_answer_from_sources", unsupported["release_reason"]) + def test_example_claim_ids_are_unique_and_release_lists_are_unambiguous(self) -> None: + example = load_json(EXAMPLE) + claim_ids = [claim["id"] for claim in example["claims"]] + release_ids = ( + example["final_answer_claim_ids"] + + example["review_claim_ids"] + + example["blocked_claim_ids"] + ) + + self.assertEqual(len(claim_ids), len(set(claim_ids))) + self.assertEqual(sorted(claim_ids), sorted(release_ids)) + self.assertEqual(len(release_ids), len(set(release_ids))) + def test_final_answer_ids_are_only_certified_source_facts(self) -> None: example = load_json(EXAMPLE) claims = claim_by_id(example) diff --git a/crates/ethos-core/README.md b/crates/ethos-core/README.md index 685acdf..c8518f4 100644 --- a/crates/ethos-core/README.md +++ b/crates/ethos-core/README.md @@ -28,7 +28,8 @@ certified, partially reusable, or unverified. The same feature also exposes `derive_app_answer_release_decision(...)` for applications that have already labeled question relevance and synthesis. That helper builds the non-canonical app answer release envelope documented in `docs/app-answer-release-contract.md`; it does not replace -`verification_report.json` and does not judge relevance itself. +`verification_report.json`, does not judge relevance itself, and rejects duplicate claim IDs before +building release lists. ## Publication Boundary diff --git a/crates/ethos-core/src/verify_types.rs b/crates/ethos-core/src/verify_types.rs index 9cb6178..181af50 100644 --- a/crates/ethos-core/src/verify_types.rs +++ b/crates/ethos-core/src/verify_types.rs @@ -1418,6 +1418,47 @@ mod tests { assert!(unknown.message().contains("unknown check id: v9999")); } + #[test] + fn app_answer_release_decision_rejects_duplicate_claim_ids() { + let proof = ProofSummary { + proof_status: ProofStatus::PartiallyVerified, + request_certified: false, + reusable_grounded_check_ids: vec!["v0001".to_string(), "v0002".to_string()], + needs_review_check_ids: Vec::new(), + proof_limitations: Vec::new(), + }; + + let duplicate = derive_app_answer_release_decision( + "What was Q3 2025 revenue?", + &proof, + vec![ + AppAnswerClaimInput { + id: "claim-revenue".to_string(), + text: "Revenue grew.".to_string(), + check_ids: vec!["v0001".to_string()], + citation_grounded: None, + question_relevance: AppQuestionRelevance::DirectAnswer, + claim_type: AppClaimType::SourceFact, + }, + AppAnswerClaimInput { + id: "claim-revenue".to_string(), + text: "Revenue increased.".to_string(), + check_ids: vec!["v0002".to_string()], + citation_grounded: None, + question_relevance: AppQuestionRelevance::SupportsAnswer, + claim_type: AppClaimType::SourceFact, + }, + ], + "verification_report.json", + Vec::new(), + ) + .unwrap_err(); + + assert!(duplicate + .message() + .contains("duplicate claim id: claim-revenue")); + } + #[test] fn report_example_round_trips() { let json = include_str!(concat!( diff --git a/docs/app-answer-release-contract.md b/docs/app-answer-release-contract.md index 2cf7822..f07e880 100644 --- a/docs/app-answer-release-contract.md +++ b/docs/app-answer-release-contract.md @@ -88,6 +88,9 @@ Suggested `claim_type` values: These labels may come from application policy, a reviewed model output schema, human review, or a separate evaluator. They are outside the canonical Ethos verification report. +Each claim must have a stable `id` that is unique within the wrapper decision. The helper APIs +reject duplicate claim IDs so the release lists cannot point to ambiguous claim text. + Suggested `release_action` values for a wrapper decision envelope: - `show_final`: release the claim in the final answer. diff --git a/python/README.md b/python/README.md index 60fc50c..4ede0ad 100644 --- a/python/README.md +++ b/python/README.md @@ -154,7 +154,8 @@ print(decision["app_status"]) The helper does not judge relevance or synthesis. Callers supply those labels; the helper applies the release rule and requires referenced Ethos check IDs to be reusable before a claim can enter -the final answer. +the final answer. It also rejects duplicate claim IDs so `final_answer_claim_ids`, +`review_claim_ids`, and `blocked_claim_ids` stay unambiguous. Run the focused tests with: diff --git a/python/ethos_pdf/_cli.py b/python/ethos_pdf/_cli.py index 5c0bdb4..e01930e 100644 --- a/python/ethos_pdf/_cli.py +++ b/python/ethos_pdf/_cli.py @@ -725,10 +725,15 @@ def app_answer_release_decision( reusable = set(grounding["reusable_grounded_check_ids"]) needs_review = set(grounding["needs_review_check_ids"]) - decisions = [ - _app_claim_decision(claim, reusable, needs_review) - for claim in claims - ] + seen_claim_ids: set[str] = set() + decisions = [] + for claim in claims: + decision = _app_claim_decision(claim, reusable, needs_review) + claim_id = decision["id"] + if claim_id in seen_claim_ids: + raise ValueError(f"duplicate claim id: {claim_id}") + seen_claim_ids.add(claim_id) + decisions.append(decision) final_ids = [ claim["id"] for claim in decisions if claim["release_action"] == "show_final" ] diff --git a/python/tests/test_cli_surface.py b/python/tests/test_cli_surface.py index fd6d634..7a40887 100644 --- a/python/tests/test_cli_surface.py +++ b/python/tests/test_cli_surface.py @@ -741,6 +741,37 @@ def test_app_answer_release_decision_rejects_inconsistent_or_unknown_checks( ], ) + def test_app_answer_release_decision_rejects_duplicate_claim_ids(self) -> None: + summary = { + "proof_status": "partially_verified", + "request_certified": False, + "reusable_grounded_check_ids": ["v0001", "v0002"], + "needs_review_check_ids": [], + "proof_limitations": [], + } + + with self.assertRaisesRegex(ValueError, "duplicate claim id: claim-revenue"): + app_answer_release_decision( + "What was Q3 2025 revenue?", + summary, + [ + { + "id": "claim-revenue", + "text": "Revenue grew.", + "check_ids": ["v0001"], + "question_relevance": "direct_answer", + "claim_type": "source_fact", + }, + { + "id": "claim-revenue", + "text": "Revenue increased.", + "check_ids": ["v0002"], + "question_relevance": "supports_answer", + "claim_type": "source_fact", + }, + ], + ) + def test_anchor_maps_source_evidence_refs_and_grounding(self) -> None: result = anchor( self.document, diff --git a/schemas/ethos-app-answer-release-decision.schema.json b/schemas/ethos-app-answer-release-decision.schema.json index 87f29b1..67ffea6 100644 --- a/schemas/ethos-app-answer-release-decision.schema.json +++ b/schemas/ethos-app-answer-release-decision.schema.json @@ -63,7 +63,9 @@ "app_status": { "$ref": "#/$defs/app_status" }, "claims": { "type": "array", - "items": { "$ref": "#/$defs/claim_decision" } + "items": { "$ref": "#/$defs/claim_decision" }, + "uniqueItems": true, + "description": "Application claim decisions. Claim ids must be unique within this array; helper APIs enforce id-level uniqueness." }, "final_answer_claim_ids": { "type": "array", @@ -137,7 +139,11 @@ ], "additionalProperties": false, "properties": { - "id": { "type": "string", "minLength": 1 }, + "id": { + "type": "string", + "minLength": 1, + "description": "Stable application claim id. Each id must be unique within the claims array." + }, "text": { "type": "string", "minLength": 1 }, "check_ids": { "type": "array", From 340b78f41bd2439e8f1ffb598bfd8c307ecb26ac Mon Sep 17 00:00:00 2001 From: docushell-admin Date: Tue, 30 Jun 2026 15:50:09 +0530 Subject: [PATCH 09/10] Add app release example conformance tests Signed-off-by: docushell-admin --- .../test_app_answer_release_contract.py | 34 ++++++++++ crates/ethos-core/src/verify_types.rs | 68 +++++++++++++++++++ 2 files changed, 102 insertions(+) diff --git a/.github/scripts/test_app_answer_release_contract.py b/.github/scripts/test_app_answer_release_contract.py index 3c90732..a1ee593 100644 --- a/.github/scripts/test_app_answer_release_contract.py +++ b/.github/scripts/test_app_answer_release_contract.py @@ -83,6 +83,21 @@ def claim_by_id(example: dict) -> dict[str, dict]: return {claim["id"]: claim for claim in example["claims"]} +def claim_inputs(example: dict) -> list[dict]: + fields = [ + "id", + "text", + "check_ids", + "citation_grounded", + "question_relevance", + "claim_type", + ] + return [ + {field: claim[field] for field in fields if field in claim} + for claim in example["claims"] + ] + + def normalized_markdown(path: Path) -> str: return " ".join(path.read_text(encoding="utf-8").split()) @@ -206,6 +221,25 @@ def test_python_helper_emits_schema_conformant_decision(self) -> None: self.assertNotIn("check_id", decision["claims"][0]) self.assertEqual(["v0001"], decision["claims"][0]["check_ids"]) + def test_python_helper_reproduces_documented_example(self) -> None: + if str(PYTHON_PACKAGE) not in sys.path: + sys.path.insert(0, str(PYTHON_PACKAGE)) + from ethos_pdf import app_answer_release_decision + + example = load_json(EXAMPLE) + grounding = dict(example["grounding"]) + verification_report_ref = grounding.pop("verification_report_ref") + + decision = app_answer_release_decision( + example["question"], + grounding, + claim_inputs(example), + verification_report_ref=verification_report_ref, + notes=example["notes"], + ) + + self.assertEqual(example, decision) + def test_schema_rejects_grounded_unsupported_claims(self) -> None: schema = load_json(SCHEMA) example = load_json(EXAMPLE) diff --git a/crates/ethos-core/src/verify_types.rs b/crates/ethos-core/src/verify_types.rs index 181af50..a521d28 100644 --- a/crates/ethos-core/src/verify_types.rs +++ b/crates/ethos-core/src/verify_types.rs @@ -1343,6 +1343,74 @@ mod tests { assert_eq!(json["claims"][2]["check_ids"][1], "v0003"); } + #[test] + fn app_answer_release_decision_reproduces_documented_example() { + let expected: serde_json::Value = serde_json::from_str(include_str!(concat!( + env!("CARGO_MANIFEST_DIR"), + "/../../schemas/examples/app-answer-release-decision.example.json" + ))) + .unwrap(); + let proof = ProofSummary { + proof_status: ProofStatus::PartiallyVerified, + request_certified: false, + reusable_grounded_check_ids: vec![ + "v0001".to_string(), + "v0002".to_string(), + "v0003".to_string(), + ], + needs_review_check_ids: vec!["v0004".to_string()], + proof_limitations: vec![ProofLimitation::NonGroundedChecks], + }; + + let decision = derive_app_answer_release_decision( + "What was Q3 2025 revenue?", + &proof, + vec![ + AppAnswerClaimInput { + id: "claim-revenue".to_string(), + text: "Revenue grew to $12.4M in Q3 2025.".to_string(), + check_ids: vec!["v0001".to_string()], + citation_grounded: Some(true), + question_relevance: AppQuestionRelevance::DirectAnswer, + claim_type: AppClaimType::SourceFact, + }, + AppAnswerClaimInput { + id: "claim-office-background".to_string(), + text: "The company opened a European office.".to_string(), + check_ids: vec!["v0002".to_string()], + citation_grounded: Some(true), + question_relevance: AppQuestionRelevance::BackgroundOnly, + claim_type: AppClaimType::SourceFact, + }, + AppAnswerClaimInput { + id: "claim-growth-driver".to_string(), + text: "Q3 revenue growth was likely driven by enterprise expansion." + .to_string(), + check_ids: vec!["v0001".to_string(), "v0003".to_string()], + citation_grounded: Some(true), + question_relevance: AppQuestionRelevance::SupportsAnswer, + claim_type: AppClaimType::Synthesis, + }, + AppAnswerClaimInput { + id: "claim-margin".to_string(), + text: "Gross margin improved in Q3 2025.".to_string(), + check_ids: vec!["v0004".to_string()], + citation_grounded: Some(false), + question_relevance: AppQuestionRelevance::DirectAnswer, + claim_type: AppClaimType::Unsupported, + }, + ], + "verification_report.json", + vec![ + "This app-layer envelope is not verification_report.json; it records release policy above Ethos grounding." + .to_string(), + ], + ) + .unwrap(); + + assert_eq!(serde_json::to_value(&decision).unwrap(), expected); + } + #[test] fn app_answer_release_decision_blocks_empty_source_answer() { let proof = ProofSummary { From 75b8b57182579d0f110a2446286183956db68c95 Mon Sep 17 00:00:00 2001 From: docushell-admin Date: Tue, 30 Jun 2026 15:54:13 +0530 Subject: [PATCH 10/10] Run app release contract in CI Signed-off-by: docushell-admin --- .github/scripts/test_ci_workflow.py | 6 ++++++ .github/workflows/ci.yml | 2 ++ 2 files changed, 8 insertions(+) diff --git a/.github/scripts/test_ci_workflow.py b/.github/scripts/test_ci_workflow.py index 3838718..bdefae7 100644 --- a/.github/scripts/test_ci_workflow.py +++ b/.github/scripts/test_ci_workflow.py @@ -98,6 +98,10 @@ def test_ci_workflow_guard_is_run_by_ci(self) -> None: ) self.assertLess( text.index("python3 .github/scripts/test_evidence_anchor_v1_contract.py"), + text.index("make app-answer-release-contract PYTHON=python3"), + ) + self.assertLess( + text.index("make app-answer-release-contract PYTHON=python3"), text.index("python3 .github/scripts/test_milestone_d_internal_contracts.py"), ) self.assertIn("python3 .github/scripts/test_evidence_anchor_v1_contract.py", text) @@ -105,6 +109,8 @@ def test_ci_workflow_guard_is_run_by_ci(self) -> None: 1, text.count("python3 .github/scripts/test_evidence_anchor_v1_contract.py"), ) + self.assertIn("make app-answer-release-contract PYTHON=python3", text) + self.assertEqual(1, text.count("make app-answer-release-contract PYTHON=python3")) self.assertIn("python3 .github/scripts/test_milestone_d_internal_contracts.py", text) self.assertIn("python3 .github/scripts/test_milestone_b_closeout_record.py", text) self.assertIn("python3 .github/scripts/test_milestone_c_closeout_record.py", text) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 1d92f7f..562564a 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -86,6 +86,8 @@ jobs: run: python3 .github/scripts/test_h2_source_snapshot_closeout.py - name: Evidence anchor v1 contract guard tests run: python3 .github/scripts/test_evidence_anchor_v1_contract.py + - name: App answer release contract target + run: make app-answer-release-contract PYTHON=python3 - name: Milestone D internal contract target tests run: python3 .github/scripts/test_milestone_d_internal_contracts.py - name: Milestone B closeout validation record tests