diff --git a/CHANGELOG.md b/CHANGELOG.md index 6c1f698..0f14a98 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,19 @@ All notable changes to Agents.KT are documented here. The format follows [Keep a ## [Unreleased] +## [0.8.2] - 2026-07-01 + +_Standards & trust hardening._ Hardens the experimental x402 buyer (mandatory guardrails, cross-payment +session limits, a signer seam, CAIP-2 network ids) and adds machine-readable release-truth metadata. + +### Added — release-truth: a single source of truth for version/provider/protocol claims (#4735) + +An external audit found the advertised version + provider + protocol claims drifting across the README, +roadmap, comparison page, and POM (0.8.1 shipped while several surfaces still said 0.8.0/0.7.2). New +`release-metadata.yaml` is now the one place those claims live, and `ReleaseMetadataConsistencyTest` pins +the in-repo surfaces to it — a release that edits only the metadata fails the build until the prose moves. +Kills the drift class the de-slop epic (#3083) flagged. + ### Changed — x402 buyer trust hardening: guardrails are now mandatory and bind more (#4528) An external audit flagged that the "guardrails-first" buyer had **optional** guardrails (an empty diff --git a/README.md b/README.md index 32e7e47..952e89d 100644 --- a/README.md +++ b/README.md @@ -37,7 +37,7 @@ The 0.6–0.7 line turns those boundaries into reviewable evidence: deterministi ```kotlin // build.gradle.kts dependencies { - implementation("ai.deep-code:agents-kt:0.8.1") + implementation("ai.deep-code:agents-kt:0.8.2") } ``` @@ -328,7 +328,7 @@ Topical guides: ## Current Release -The latest published release is `0.8.0` — **interoperable, multimodal agents, with capability grants** (the largest minor since 0.5.0). (Between releases, `main` carries unreleased work under a `-SNAPSHOT` version; see the [CHANGELOG](CHANGELOG.md) for what's landed since.) Highlights: **A2A v1** (agents are A2A servers + typed clients), full **multimodal** (vision across Claude/OpenAI/Ollama, audio STT/TTS tools with self-hosted Whisper/Qwen, image generation), an **eighth model provider — Google Gemini** (`model { gemini("gemini-2.5-flash") }`, a full from-scratch adapter with native SSE, function calling, `responseJsonSchema` decoding, and thought-summary reasoning), **capability grants** (`grants { allow(...); confirm(...) }` — `confirm` tools need the granting agent's authorization, fail-closed), **agent.json** serialization, the **RAG** seam, and richer **composition** (`handoff` / `firstOf` / `.speculative` / `loopUntil` / aggregators / forum captains) plus HITL gates and an eval harness. The Layer-2 **sandbox backends** (Docker / egress proxy / read confinement) move to **0.9.0**. Additive only — drop-in on the 0.x line. Dependency line: Kotlin 2.4.0. +The latest published release is `0.8.2` — **standards & trust hardening** (on the 0.8.1 agentic-web base, itself on the 0.8.0 base of interoperable, multimodal agents with capability grants). (Between releases, `main` carries unreleased work under a `-SNAPSHOT` version; see the [CHANGELOG](CHANGELOG.md) for what's landed since.) **0.8.2 hardened** the *experimental* **x402** buyer: guardrails are now **mandatory** (`X402SpendPolicy` is no longer defaultable), bind tighter (`allowedAssets` / `allowedResourceOrigins` / clamped authorization lifetime), select offers deterministically (cheapest permitted, not the seller's first), enforce **cross-payment session limits** (aggregate spend caps via `X402SpendStore`), and sign through an **`X402Signer`** seam (KMS / HSM / scoped session key — permanent keys never touch the app heap); it also accepts **CAIP-2 network ids** (`eip155:84532`) as a v2-interop step on the v1 wire. Plus **release-truth** — a machine-readable `release-metadata.yaml` as the single source of truth for version/provider/protocol claims, guarded by a consistency test so the surfaces can't drift again. **0.8.1 added** the agentic-web surfaces — **AGNTCY** (OASF export/import + Identity verify + DIR client), **AG-UI** serving (`AgUiServer.from(agent)`, incl. live REASONING + `TOOL_CALL_RESULT`), **NLWeb** serve + search, **x402** seller gate + the *experimental* guardrails-first buyer, audit-ledger misbehaviour folding, and default transient-network retry. On the **0.8.0** base: **A2A v1** (agents are A2A servers + typed clients), full **multimodal** (vision across Claude/OpenAI/Ollama, audio STT/TTS tools with self-hosted Whisper/Qwen, image generation), an **eighth model provider — Google Gemini** (`model { gemini("gemini-2.5-flash") }`, a full from-scratch adapter with native SSE, function calling, `responseJsonSchema` decoding, and thought-summary reasoning), **capability grants** (`grants { allow(...); confirm(...) }` — `confirm` tools need the granting agent's authorization, fail-closed), **agent.json** serialization, the **RAG** seam, and richer **composition** (`handoff` / `firstOf` / `.speculative` / `loopUntil` / aggregators / forum captains) plus HITL gates and an eval harness. The Layer-2 **sandbox backends** (Docker / egress proxy / read confinement) move to **0.9.0**. Additive only — drop-in on the 0.x line. Dependency line: Kotlin 2.4.0. **0.7.23 — maintainability + a model-error policy.** A behavior-preserving, drop-in release that closes the bulk of the code-smell remediation epic (#2790) and finishes the **AgenticLoop decomposition** begun in 0.7.21: the `Agent` god class splits into `InterceptorChain` + `ListenerRegistry` (#2793); `McpServer` into HTTP intake + a transport-agnostic `McpDispatcher` (#2795); the five composition operators' duplicated streaming-session scaffold collapses into one `agentSessionScope` (#2797); `LiveShow`'s banner + thread-shadowing spinner become their own units (#2798); `Forum`/`Branch` lose their dual-path duplication (#2802); a typed `GenerableCodec` seam collapses the `@Generable` casts to one boundary (#2803); and `executeAgentic`'s last setup block extracts as `resolveAllowedTools` (#3423). The one **new public API** is **`onLLMError`** (#3508): when a model is configured, a failed model call in the agentic loop fails fast and loud by default, with `onLLMError { e -> RespondWith(fallback) | Rethrow }` as the opt-in recovery hook. The detekt-baseline ratchet fell 423 → 415 and `@Suppress("UNCHECKED_CAST")` 42 → 30 across the release. Only #2791 (the turn-loop core of `executeAgentic`) remains open in the epic. @@ -338,7 +338,7 @@ The latest published release is `0.8.0` — **interoperable, multimodal agents, **0.7.1 — verify-gate hardening.** The manifest `verify` gate compares policy **sets** (not coarse scores), so it catches widenings it previously missed (a host added within `hosts` mode, or a write glob broadened without changing the count), keyed per `agentName.toolName`; plus docs/KDoc drift fixes. -**0.7.0 — Boundaries you can enforce externally.** The 0.6 line made tool policies declarable and auditable; 0.7.0 makes them **enforced**. A tool's declared `ToolPolicy` now constrains it at runtime: Layer 1 (in-JVM filesystem-argument gate, #2890) plus Layer 2 OS sandboxing (#1916) — macOS Seatbelt, Linux bubblewrap, a firejail setuid fallback, and a plain-`ProcessBuilder` + loud `UNCONFINED` warning where no tool is present — confine subprocess-shaped tools to their declared write roots, an environment allow-list, a working directory, and a default-deny network. And the deterministic permission manifest is reachable **outside Gradle** via the standalone [`agents-kt` CLI](docs/cli.md) (`generate` / `inspect` / `verify`, #1923) — a drop-in CI gate that fails when a change widens a capability boundary. *Deferred to 0.8: `WasmSandbox` (#2894), `DockerSandbox` (#2895), the network hostname-allowlist proxy (#2893), and the `grants { }` structure DSL.* +**0.7.0 — Boundaries you can enforce externally.** The 0.6 line made tool policies declarable and auditable; 0.7.0 makes them **enforced**. A tool's declared `ToolPolicy` now constrains it at runtime: Layer 1 (in-JVM filesystem-argument gate, #2890) plus Layer 2 OS sandboxing (#1916) — macOS Seatbelt, Linux bubblewrap, a firejail setuid fallback, and a plain-`ProcessBuilder` + loud `UNCONFINED` warning where no tool is present — confine subprocess-shaped tools to their declared write roots, an environment allow-list, a working directory, and a default-deny network. And the deterministic permission manifest is reachable **outside Gradle** via the standalone [`agents-kt` CLI](docs/cli.md) (`generate` / `inspect` / `verify`, #1923) — a drop-in CI gate that fails when a change widens a capability boundary. *(Of the Layer-2 backends pencilled for later: the `grants { }` DSL shipped in 0.8.0 (#4545); `WasmSandbox` (#2894) was closed won't-do — embedded-WASM-for-tools isn't rational; `DockerSandbox` (#2895) and the network hostname-allowlist proxy (#2893) are now planned for 0.9.0.)* **0.6.6 — Maintainability + cancellation (#2863 + epic #2790).** Session catch now distinguishes `CancellationException` (propagate per structured concurrency — no synthetic `Failed` event) from `TimeoutCancellationException` (real failure — keeps surfacing as `Failed`), plus 10 internal refactors (detekt baseline, `JsonEscape`/`JsonRpc` consolidation, `HttpModelClientSupport.sendBounded`, …). Additive only — every 0.6.5 caller compiles and runs unchanged. diff --git a/build.gradle.kts b/build.gradle.kts index 382a282..fa319bb 100644 --- a/build.gradle.kts +++ b/build.gradle.kts @@ -16,7 +16,7 @@ plugins { } group = "ai.deep-code" -version = "0.8.2-SNAPSHOT" +version = "0.8.2" repositories { mavenCentral() @@ -809,7 +809,12 @@ publishing { pom { name.set("Agents.KT") - description.set("Typed Kotlin DSL framework for AI agent systems") + description.set( + "The typed agent runtime for the JVM — bounded agent systems where authority is " + + "explicit before execution, enforced during execution, and evidenced afterward. " + + "Typed Agent contracts, least-privilege tools, permission manifests, and " + + "audit evidence by construction. MCP / A2A / AG-UI / NLWeb / x402 native.", + ) url.set("https://github.com/Deep-CodeAI/Agents.KT") licenses { diff --git a/docs/comparison.md b/docs/comparison.md index 919c977..ecff1f7 100644 --- a/docs/comparison.md +++ b/docs/comparison.md @@ -139,7 +139,7 @@ A few shortcuts that point at one framework over the others: ## Status notes (2026-06) -- **Agents.KT 0.8.0 (latest release)** — interoperable, multimodal agents with capability grants: **A2A v1** (server + typed client), full **multimodal** (audio STT/TTS, vision, image generation), the **RAG** seam, richer **composition** (`handoff` / `firstOf` / `.speculative` / `loopUntil` / aggregators / forum captains), **HITL** gates + an **eval** harness, an eighth model provider (**Google Gemini**), **agent.json** serialization, and the **capability-grants** DSL (`grants { allow / confirm }`) — on top of 0.7.x's runtime ToolPolicy enforcement (in-JVM gate + OS sandbox), the standalone `agents-kt` CLI, tamper-evident audit ledger, fail-loud skill routing, the `onLLMError` policy, the Perplexity connector + `perplexitySearch` grounded tool, and 0.6.0's permission manifests / JSONL audit export / OTel-LangSmith-Langfuse bridges / constrained decoding. *Deferred to 0.9.0:* the remaining Layer-2 sandbox backends (Docker / egress proxy / read confinement). +- **Agents.KT 0.8.2 (latest release)** — *standards & trust hardening.* Hardens the *experimental* **x402** buyer — guardrails are now **mandatory** (`X402SpendPolicy` no longer defaultable), bind tighter (`allowedAssets` / `allowedResourceOrigins` / clamped authorization lifetime), select the cheapest permitted offer, enforce **cross-payment session limits**, and sign through an **`X402Signer`** seam (KMS / HSM / scoped session key) — and accepts **CAIP-2 network ids** on the v1 wire; plus machine-readable **release-truth** metadata guarding version/provider/protocol claims from drift. On the **0.8.1** agentic-web base — the interop surfaces — **AGNTCY** (OASF export/import + Identity-badge verify + DIR Store/Search/Routing client), **AG-UI** serving (`AgUiServer.from(agent)` → typed SSE incl. live REASONING + `TOOL_CALL_RESULT`), **NLWeb** serve + search (an `/ask` compatibility slice), and **x402** payments (seller gate + *experimental* guardrails-first buyer) — plus audit-ledger misbehaviour folding and default transient-network retry. On the **0.8.0** base: interoperable, multimodal agents with capability grants — **A2A v1** (server + typed client), full **multimodal** (audio STT/TTS, vision, image generation), the **RAG** seam, richer **composition** (`handoff` / `firstOf` / `.speculative` / `loopUntil` / aggregators / forum captains), **HITL** gates + an **eval** harness, an eighth model provider (**Google Gemini**), **agent.json** serialization, and the **capability-grants** DSL (`grants { allow / confirm }`) — itself on 0.7.x's runtime ToolPolicy enforcement (in-JVM gate + OS sandbox), the standalone `agents-kt` CLI, tamper-evident audit ledger, fail-loud skill routing, the `onLLMError` policy, the Perplexity connector + `perplexitySearch` grounded tool, and 0.6.0's permission manifests / JSONL audit export / OTel-LangSmith-Langfuse bridges / constrained decoding. *Next (0.9.0):* typed multi-agent choreography, plus the remaining Layer-2 sandbox backends (Docker / egress proxy / read confinement). - **LangChain 0.3.x** — stable, ecosystem mature. LCEL is the recommended composition surface. - **Semantic Kernel 1.x** — stable, MCP integration in preview. - **AutoGen 0.4.x** — major architectural rewrite landed; the new core/agentchat split is recent. diff --git a/docs/roadmap.md b/docs/roadmap.md index 0efdc8e..fb1970b 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -9,6 +9,8 @@ 0.6.0 Boundaries you can audit — shipped (epic [#1911](../../issues/1911)) 0.7.0 Boundaries you can enforce externally — shipped (epic [#2879](../../issues/2879)) 0.8.0 Interoperable, multimodal agents (+ grants) — shipped (A2A v1, multimodal, RAG, composition, Gemini, capability grants) +0.8.1 The agentic web: discover, serve, get paid — shipped (AGNTCY, AG-UI, NLWeb, x402 seller + experimental buyer) +0.8.2 Standards & trust hardening — shipped (x402 buyer: mandatory guardrails + session limits + signer seam + CAIP-2 ids; release-truth metadata) 0.9.0 Layer-2 sandbox backends — next (Docker/proxy/read-confinement) ``` diff --git a/release-metadata.yaml b/release-metadata.yaml new file mode 100644 index 0000000..1948ac5 --- /dev/null +++ b/release-metadata.yaml @@ -0,0 +1,27 @@ +# Single source of truth for release / version / provider / protocol claims. +# +# Why this file exists: version + provider + protocol claims were duplicated across the README, roadmap, +# comparison page, POM, and website, and drifted (0.8.1 shipped but several surfaces still said 0.8.0/0.7.2). +# `ReleaseMetadataConsistencyTest` validates the in-repo surfaces against this file so a single edit here +# keeps them honest. Update this file as part of every release (RELEASE_RUNBOOK step 6). +# +# Schema is intentionally flat + simple-scalar so a dependency-free test can parse it. + +currentRelease: "0.8.2" # latest version resolvable on Maven Central +developmentVersion: "0.8.2-SNAPSHOT" +providers: 8 # Ollama, Claude, OpenAI, DeepSeek, Kimi, OpenRouter, Perplexity, Gemini + +nextRelease: + version: "0.9.0" + theme: "choreography you can't deadlock (typed multi-agent coordination)" + +# Protocol support is a SLICE, not necessarily a full implementation — state it honestly. +# status: shipped | partial | experimental ; support = the supported subset. +protocols: + mcp: { status: shipped, support: "client + server (McpServer.from(agent), McpRunner)" } + a2a: { status: shipped, version: "0.2", support: "server + typed client; message/send; capabilities.extensions" } + agui: { status: partial, support: "streaming subset (lifecycle/text/reasoning/tool incl. TOOL_CALL_RESULT/step); STATE + client-tool round-trips deferred" } + nlweb: { status: partial, targetVersion: "0.55", support: "/ask compatibility slice (single non-streaming response)" } + x402: { status: experimental, version: "1", support: "seller gate + buyer (mandatory guardrails, cross-payment session limits, X402Signer seam); accepts CAIP-2 network ids (v1 wire; v2 transport deferred)" } + agntcy: { status: shipped, support: "OASF export/import + Identity badge verify + DIR Store/Search/Routing client" } + ap2: { status: spiked, support: "feasibility spike (:agents-kt-ap2): mandate verify + x402 settle + AgentCard advertise" } diff --git a/src/test/kotlin/agents_engine/core/ReleaseMetadataConsistencyTest.kt b/src/test/kotlin/agents_engine/core/ReleaseMetadataConsistencyTest.kt new file mode 100644 index 0000000..d070713 --- /dev/null +++ b/src/test/kotlin/agents_engine/core/ReleaseMetadataConsistencyTest.kt @@ -0,0 +1,82 @@ +package agents_engine.core + +import agents_engine.model.ModelProvider +import java.nio.file.Files +import java.nio.file.Path +import kotlin.test.Test +import kotlin.test.assertEquals +import kotlin.test.assertTrue + +// Single-source-of-truth guard (#0.8.2 release-truth). An external audit found the release version + provider +// claims drifting across README / roadmap / comparison / site (0.8.1 shipped but several surfaces still said +// 0.8.0/0.7.2). `release-metadata.yaml` is now the one place those claims live; this test pins the in-repo +// surfaces to it, so the next release that edits only the metadata file fails the build until the prose moves. +class ReleaseMetadataConsistencyTest { + + private fun read(relative: String): String { + val path = Path.of(relative) + assertTrue(Files.exists(path), "expected $relative to exist (tests run from the repo root)") + return Files.readString(path) + } + + private val metadata = read("release-metadata.yaml") + + // Top-level scalar: the line `key: value [# comment]` at column 0; strip inline comment + quotes. + private fun scalar(key: String): String { + val line = metadata.lineSequence().firstOrNull { it.startsWith("$key:") } + ?: error("release-metadata.yaml is missing a top-level scalar '$key'") + return line.substringAfter(':').substringBefore('#').trim().trim('"') + } + + private val currentRelease = scalar("currentRelease") + private val developmentVersion = scalar("developmentVersion") + private val providers = scalar("providers").toInt() + + @Test + fun `provider count in metadata tracks ModelProvider entries`() { + assertEquals( + ModelProvider.entries.size, providers, + "release-metadata.yaml providers=$providers but ModelProvider has ${ModelProvider.entries.size} entries", + ) + } + + @Test + fun `gradle development version matches metadata`() { + val gradleVersion = Regex("""^version\s*=\s*"([^"]+)"""", RegexOption.MULTILINE) + .find(read("build.gradle.kts"))?.groupValues?.get(1) ?: error("no version in build.gradle.kts") + // On normal dev `main` the build is the -SNAPSHOT dev version; during a release commit it is exactly the + // currentRelease (checkReadmeVersion / checkSnapshotPolicy own that transition). Accept either. + assertTrue( + gradleVersion == developmentVersion || gradleVersion == currentRelease, + "build.gradle.kts version '$gradleVersion' is neither the metadata developmentVersion " + + "'$developmentVersion' nor currentRelease '$currentRelease'", + ) + } + + @Test + fun `README dependency snippet and Current Release both name the current release`() { + val readme = read("README.md") + assertTrue( + readme.contains("ai.deep-code:agents-kt:$currentRelease"), + "README dependency snippet must advertise the current release $currentRelease", + ) + val currentReleaseSection = readme.substringAfter("## Current Release").substringBefore("\n## ") + assertTrue( + currentReleaseSection.contains(currentRelease), + "README 'Current Release' section must name $currentRelease (found stale content)", + ) + } + + @Test + fun `roadmap and comparison name the current release`() { + assertTrue( + read("docs/roadmap.md").contains(currentRelease), + "docs/roadmap.md must mention the current release $currentRelease", + ) + val comparison = read("docs/comparison.md") + assertTrue( + comparison.contains("$currentRelease (latest release)"), + "docs/comparison.md must mark $currentRelease as the latest release", + ) + } +}