feat(agent-bff): fetch and cache agent schema, capabilities and allow-list#1732
Conversation
…-list Add a read-model layer that caches the agent schema (24h), derives collection/relation/action allow-list metadata with relation targets and an action-endpoint map, and caches per-collection capabilities coupled to the schema refresh. Adds a Metrics port emitting schema-cache and action-endpoint counters. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2 new issues
|
| this.ttlMs = ttlMs; | ||
| } | ||
|
|
||
| async get(): Promise<ForestSchemaCollection[]> { |
There was a problem hiding this comment.
🟡 Medium read-model/schema-cache.ts:46
When the TTL expires, get() calls refresh() and returns the awaited promise, so every concurrent reader blocks on the network fetch instead of immediately receiving the last-good schema. While this.inFlight is pending (e.g. Forest is slow or timing out), all requests stall despite usable cached data, breaking the intended "serve stale until refresh succeeds" behavior. Consider returning the stale this.entry.collections immediately when a warm cache exists and only triggering a background refresh.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @packages/agent-bff/src/read-model/schema-cache.ts around line 46:
When the TTL expires, `get()` calls `refresh()` and returns the awaited promise, so every concurrent reader blocks on the network fetch instead of immediately receiving the last-good schema. While `this.inFlight` is pending (e.g. Forest is slow or timing out), all requests stall despite usable cached data, breaking the intended "serve stale until refresh succeeds" behavior. Consider returning the stale `this.entry.collections` immediately when a warm cache exists and only triggering a background refresh.
There was a problem hiding this comment.
Intentional — this is synchronous refresh-on-read, chosen over stale-while-revalidate during design.
The "serve last-good" contract applies on a refresh failure (warm cache keeps serving stale + emits schema_cache_refresh_error), not while a refresh is in flight. When the TTL expires we deliberately block on the refresh: concurrent readers share a single in-flight fetch (dedup, so one network call), the wait is bounded by the client's 10s timeout, and it only happens at the 24h expiry boundary (~once/day). Serving a stale schema when a fresh one is one await away was a conscious no (it also matches the MCP server precedent).
Leaving as-is.
|
Coverage Impact ⬆️ Merging this pull request will increase total coverage on Modified Files with Diff Coverage (10) 🛟 Help
|
…argets
A reference is `${foreignCollection}.${key}`; the collection name may itself
contain dots (mongoose nested collections like `User.address`). Split on the
last dot only instead of taking the first segment.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… store Add tests for createReadModel wiring, ForestSchemaClient (mocked SchemaService), ReadModelStore.ageSeconds delegation, and the read-model defensive branches (no-reference relation, actions-less collection, unknown collection). read-model files at 100% coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| private buildRelations(collection: ForestSchemaCollection): void { | ||
| const relations = new Map<string, RelationTarget>(); | ||
|
|
||
| for (const field of collection.fields) { |
There was a problem hiding this comment.
This throws on a malformed payload where fields is absent: guard with collection.fields ?? [] like buildActionEndpoints does for actions. It runs outside SchemaCache try/catch, so a bad collection wedges the store for the whole 24h TTL with no SchemaUnavailableError and no metric.
| private async doRefresh(): Promise<ForestSchemaCollection[]> { | ||
| try { | ||
| const collections = await this.fetcher.fetchSchema(); | ||
| this.entry = { collections, fetchedAt: this.now() }; |
There was a problem hiding this comment.
An empty collections from a transient 200 gets cached as last-good for 24h and silently denies every collection/relation/action. What about rejecting an empty result here so the cold-cache error path fires instead?
|
|
||
| const { generation } = this; | ||
| const fetch = fetcher(collection) | ||
| .then(result => { |
There was a problem hiding this comment.
Unlike SchemaCache.doRefresh, this has no .catch: a failing capabilities() propagates raw with no error counter and no last-good fallback. What about mirroring the schema cache here (emit a counter, serve stale)?
| } | ||
|
|
||
| async get(collection: string, fetcher: CapabilitiesFetcher): Promise<CapabilitiesResult> { | ||
| const entry = this.entries.get(collection); |
There was a problem hiding this comment.
Cache is keyed by collection only while the token is bound in the fetcher, so the first caller result is served to every token for 24h. Can we confirm /capabilities is not token-scoped server-side? If it is, key by token.
| async getReadModel(): Promise<ReadModel> { | ||
| const collections = await this.schemaCache.get(); | ||
|
|
||
| if (collections !== this.lastCollections || !this.readModel) { |
There was a problem hiding this comment.
Refresh detection by array-reference identity is a contract only the doc-comment enforces, no test pins it: a future defensive copy in SchemaCache would rebuild and clear capabilities on every request. Returning a version counter from get() would make it testable.
| name: action.name, | ||
| endpoint: action.endpoint, | ||
| hooks: action.hooks, | ||
| fields: action.fields, |
There was a problem hiding this comment.
fields and hooks are stored by reference from the shared cached schema, so a future mutating consumer would corrupt the cache for all requests. What about freezing or shallow-cloning them here?
| it('should default to console metrics when none is provided', () => { | ||
| const bundle = createReadModel({ forestServerUrl: 'x', envSecret: 'y' }); | ||
|
|
||
| expect(bundle.store).toBeDefined(); |
There was a problem hiding this comment.
These assert only toBeDefined, not that console metrics were selected: the metrics ?? createConsoleMetrics fallback could be deleted and the test still passes. Assert an increment call routes through the console logger.

Fixes PRD-668
What
Adds the BFF read-model layer (Slice 2): an in-memory, per-instance cache of the agent schema and the metadata every later BFF slice reads instead of re-fetching.
schema-cache.ts) — fetches viaSchemaService.getSchema()(MCP precedent), 24h TTL, in-flight dedup. The TTL only triggers a refresh; the last good schema is served until a refresh succeeds. Cold cache → typedSchemaUnavailableError; warm refresh failure → serves last-good +schema_cache_refresh_error. Exposesschema_cache_age_seconds.read-model.ts) — collection/relation/action allow-lists and relation targets, keyed(collection, name)(relation/action names are collection-scoped). Action allow-list = endpoint-map keys (endpoint-less actions are not exposed). Action-endpoint map typed as agent-clientActionEndpointsByCollection.action-endpoint-resolver.ts) — never throws; emitsmiss/errorcounters taggedrendering/collection/action.capabilities-cache.ts) — per-collection, 24h, fetcher passed per call (bound to the request token), invalidated with the schema. A generation guard prevents an in-flight fetch from repopulating stale data after a refresh.ReadModelStore— single owner of the coupled schema + capabilities lifecycle: a successful refresh rebuilds the read-model and clears capabilities atomically.Metricsport + console adapter (no metrics backend exists yet).Out of scope (later slices)
No operator/field validation (PRD-640), no data endpoints (PRD-639), no nested-relation guard (PRD-669), no permissions/OpenAPI, no active cache invalidation. Nothing is wired into the HTTP runtime yet — Slice 3 (PRD-671) constructs and injects it.
Tests
Build, lint, and the full suite pass (403 tests, incl. 42 new). Covers cache hit/miss, 24h boundary, cold/warm failure + re-attempt, schema-refresh invalidates capabilities, collection-scoped keying, and the miss/error counters.
🤖 Generated with Claude Code
Note
Add schema and capabilities caching with collection allow-list to agent-bff read model
ReadModelclass built from Forest schema collections that tracks allowed collections, relations, and action endpoints, including polymorphic and dotted-reference relations.SchemaCachewith a 24h TTL, stale-on-error fallback, concurrent refresh deduplication, and age/error metrics; throwsSchemaUnavailableErroron cold-cache failure.CapabilitiesCachewith a 24h TTL per collection, concurrent fetch deduplication, and generation-safe invalidation when the schema changes.ActionEndpointResolverthat returnsundefined(never throws) and emits tagged metrics on read-model errors or missing action mappings.createReadModel, which defaults to console-backed metrics when none are provided.Macroscope summarized 7f52f8d.