Skip to content

feat(agent-bff): fetch and cache agent schema, capabilities and allow-list#1732

Open
nbouliol wants to merge 3 commits into
mainfrom
feature/prd-668-fetch-and-cache-the-agent-schema-capabilities-and-allow-list
Open

feat(agent-bff): fetch and cache agent schema, capabilities and allow-list#1732
nbouliol wants to merge 3 commits into
mainfrom
feature/prd-668-fetch-and-cache-the-agent-schema-capabilities-and-allow-list

Conversation

@nbouliol

@nbouliol nbouliol commented Jul 2, 2026

Copy link
Copy Markdown
Member

Fixes PRD-668

What

Adds the BFF read-model layer (Slice 2): an in-memory, per-instance cache of the agent schema and the metadata every later BFF slice reads instead of re-fetching.

  • Schema cache (schema-cache.ts) — fetches via SchemaService.getSchema() (MCP precedent), 24h TTL, in-flight dedup. The TTL only triggers a refresh; the last good schema is served until a refresh succeeds. Cold cache → typed SchemaUnavailableError; warm refresh failure → serves last-good + schema_cache_refresh_error. Exposes schema_cache_age_seconds.
  • Read-model (read-model.ts) — collection/relation/action allow-lists and relation targets, keyed (collection, name) (relation/action names are collection-scoped). Action allow-list = endpoint-map keys (endpoint-less actions are not exposed). Action-endpoint map typed as agent-client ActionEndpointsByCollection.
  • Action-endpoint resolver (action-endpoint-resolver.ts) — never throws; emits miss/error counters tagged rendering/collection/action.
  • Capabilities cache (capabilities-cache.ts) — per-collection, 24h, fetcher passed per call (bound to the request token), invalidated with the schema. A generation guard prevents an in-flight fetch from repopulating stale data after a refresh.
  • ReadModelStore — single owner of the coupled schema + capabilities lifecycle: a successful refresh rebuilds the read-model and clears capabilities atomically.
  • Metrics port + console adapter (no metrics backend exists yet).

Out of scope (later slices)

No operator/field validation (PRD-640), no data endpoints (PRD-639), no nested-relation guard (PRD-669), no permissions/OpenAPI, no active cache invalidation. Nothing is wired into the HTTP runtime yet — Slice 3 (PRD-671) constructs and injects it.

Tests

Build, lint, and the full suite pass (403 tests, incl. 42 new). Covers cache hit/miss, 24h boundary, cold/warm failure + re-attempt, schema-refresh invalidates capabilities, collection-scoped keying, and the miss/error counters.

🤖 Generated with Claude Code

Note

Add schema and capabilities caching with collection allow-list to agent-bff read model

  • Introduces a ReadModel class built from Forest schema collections that tracks allowed collections, relations, and action endpoints, including polymorphic and dotted-reference relations.
  • Adds SchemaCache with a 24h TTL, stale-on-error fallback, concurrent refresh deduplication, and age/error metrics; throws SchemaUnavailableError on cold-cache failure.
  • Adds CapabilitiesCache with a 24h TTL per collection, concurrent fetch deduplication, and generation-safe invalidation when the schema changes.
  • Adds ActionEndpointResolver that returns undefined (never throws) and emits tagged metrics on read-model errors or missing action mappings.
  • Wires everything together in createReadModel, which defaults to console-backed metrics when none are provided.

Macroscope summarized 7f52f8d.

…-list

Add a read-model layer that caches the agent schema (24h), derives
collection/relation/action allow-list metadata with relation targets and
an action-endpoint map, and caches per-collection capabilities coupled to
the schema refresh. Adds a Metrics port emitting schema-cache and
action-endpoint counters.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@linear-code

linear-code Bot commented Jul 2, 2026

Copy link
Copy Markdown

PRD-668

@qltysh

qltysh Bot commented Jul 2, 2026

Copy link
Copy Markdown

2 new issues

Tool Category Rule Count
qlty Structure Function with many returns (count = 4): get 2

Comment thread packages/agent-bff/src/read-model/read-model.ts Outdated
this.ttlMs = ttlMs;
}

async get(): Promise<ForestSchemaCollection[]> {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium read-model/schema-cache.ts:46

When the TTL expires, get() calls refresh() and returns the awaited promise, so every concurrent reader blocks on the network fetch instead of immediately receiving the last-good schema. While this.inFlight is pending (e.g. Forest is slow or timing out), all requests stall despite usable cached data, breaking the intended "serve stale until refresh succeeds" behavior. Consider returning the stale this.entry.collections immediately when a warm cache exists and only triggering a background refresh.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @packages/agent-bff/src/read-model/schema-cache.ts around line 46:

When the TTL expires, `get()` calls `refresh()` and returns the awaited promise, so every concurrent reader blocks on the network fetch instead of immediately receiving the last-good schema. While `this.inFlight` is pending (e.g. Forest is slow or timing out), all requests stall despite usable cached data, breaking the intended "serve stale until refresh succeeds" behavior. Consider returning the stale `this.entry.collections` immediately when a warm cache exists and only triggering a background refresh.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — this is synchronous refresh-on-read, chosen over stale-while-revalidate during design.

The "serve last-good" contract applies on a refresh failure (warm cache keeps serving stale + emits schema_cache_refresh_error), not while a refresh is in flight. When the TTL expires we deliberately block on the refresh: concurrent readers share a single in-flight fetch (dedup, so one network call), the wait is bounded by the client's 10s timeout, and it only happens at the 24h expiry boundary (~once/day). Serving a stale schema when a fresh one is one await away was a conscious no (it also matches the MCP server precedent).

Leaving as-is.

@qltysh

qltysh Bot commented Jul 2, 2026

Copy link
Copy Markdown

Qlty


Coverage Impact

⬆️ Merging this pull request will increase total coverage on main by 0.03%.

Modified Files with Diff Coverage (10)

RatingFile% DiffUncovered Line #s
New Coverage rating: A
packages/agent-bff/src/read-model/create-read-model.ts100.0%
New Coverage rating: A
packages/agent-bff/src/read-model/forest-schema-client.ts100.0%
New Coverage rating: A
packages/agent-bff/src/read-model/read-model.ts100.0%
New Coverage rating: A
packages/agent-bff/src/read-model/schema-cache.ts100.0%
New Coverage rating: A
packages/agent-bff/src/read-model/agent-capabilities-fetcher.ts100.0%
New Coverage rating: A
packages/agent-bff/src/read-model/action-endpoint-resolver.ts100.0%
New Coverage rating: A
packages/agent-bff/src/read-model/capabilities-cache.ts100.0%
New Coverage rating: A
packages/agent-bff/src/read-model/read-model-store.ts100.0%
New Coverage rating: A
packages/agent-bff/src/read-model/errors.ts100.0%
New Coverage rating: A
packages/agent-bff/src/adapters/console-metrics.ts100.0%
Total100.0%
🚦 See full report on Qlty Cloud »

🛟 Help
  • Diff Coverage: Coverage for added or modified lines of code (excludes deleted files). Learn more.

  • Total Coverage: Coverage for the whole repository, calculated as the sum of all File Coverage. Learn more.

  • File Coverage: Covered Lines divided by Covered Lines plus Missed Lines. (Excludes non-executable lines including blank lines and comments.)

    • Indirect Changes: Changes to File Coverage for files that were not modified in this PR. Learn more.

nbouliol and others added 2 commits July 2, 2026 17:46
…argets

A reference is `${foreignCollection}.${key}`; the collection name may itself
contain dots (mongoose nested collections like `User.address`). Split on the
last dot only instead of taking the first segment.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… store

Add tests for createReadModel wiring, ForestSchemaClient (mocked
SchemaService), ReadModelStore.ageSeconds delegation, and the read-model
defensive branches (no-reference relation, actions-less collection, unknown
collection). read-model files at 100% coverage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
private buildRelations(collection: ForestSchemaCollection): void {
const relations = new Map<string, RelationTarget>();

for (const field of collection.fields) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This throws on a malformed payload where fields is absent: guard with collection.fields ?? [] like buildActionEndpoints does for actions. It runs outside SchemaCache try/catch, so a bad collection wedges the store for the whole 24h TTL with no SchemaUnavailableError and no metric.

private async doRefresh(): Promise<ForestSchemaCollection[]> {
try {
const collections = await this.fetcher.fetchSchema();
this.entry = { collections, fetchedAt: this.now() };

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty collections from a transient 200 gets cached as last-good for 24h and silently denies every collection/relation/action. What about rejecting an empty result here so the cold-cache error path fires instead?


const { generation } = this;
const fetch = fetcher(collection)
.then(result => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike SchemaCache.doRefresh, this has no .catch: a failing capabilities() propagates raw with no error counter and no last-good fallback. What about mirroring the schema cache here (emit a counter, serve stale)?

}

async get(collection: string, fetcher: CapabilitiesFetcher): Promise<CapabilitiesResult> {
const entry = this.entries.get(collection);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache is keyed by collection only while the token is bound in the fetcher, so the first caller result is served to every token for 24h. Can we confirm /capabilities is not token-scoped server-side? If it is, key by token.

async getReadModel(): Promise<ReadModel> {
const collections = await this.schemaCache.get();

if (collections !== this.lastCollections || !this.readModel) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refresh detection by array-reference identity is a contract only the doc-comment enforces, no test pins it: a future defensive copy in SchemaCache would rebuild and clear capabilities on every request. Returning a version counter from get() would make it testable.

name: action.name,
endpoint: action.endpoint,
hooks: action.hooks,
fields: action.fields,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields and hooks are stored by reference from the shared cached schema, so a future mutating consumer would corrupt the cache for all requests. What about freezing or shallow-cloning them here?

it('should default to console metrics when none is provided', () => {
const bundle = createReadModel({ forestServerUrl: 'x', envSecret: 'y' });

expect(bundle.store).toBeDefined();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assert only toBeDefined, not that console metrics were selected: the metrics ?? createConsoleMetrics fallback could be deleted and the test still passes. Assert an increment call routes through the console logger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants