Add config store runtime config spec#647
Conversation
aram356
left a comment
There was a problem hiding this comment.
Summary
Spec for moving trusted-server.toml from build-time embedded config to runtime config-store loading. Scope and fail-closed posture are good, but several load-bearing details are unspecified or contradict the existing pipeline. Requesting changes on the items below before merge.
Blocking
🔧 wrench
- Hash algorithm not specified despite explicit "future attestation" goal (lines 328–332): The doc commits to a config hash over canonical TOML bytes but never names SHA-256 vs BLAKE3 vs SHA-3. This locks in (or breaks) the attestation contract in
2026-01-15-attestation-design.md. Pin the algorithm + encoding (suggest SHA-256, hex) or explicitly defer to the attestation spec with a forward reference. - Canonical TOML form is underspecified for hash stability (lines 310–318): "Stable ordering" leaves these undefined: key ordering rule (lexicographic vs schema-declared vs insertion), string flavor (basic
"..."vs literal'...'), numeric formatting (1vs1.0), table vs inline-table, array-of-tables vs inline-array, Unicode normalization. Twotoml-rsversions can disagree. Either enumerate the rules or pin a canonicalizer crate + version. - "Explicitly declared settings only" contradicts the existing Serialize round-trip (lines 313–316): Today
crates/trusted-server-core/build.rs:44-45doestoml::to_string_pretty(&settings), including defaults. The new contract requires a "provided vs defaulted" distinction that#[serde(default)]erases. Implementers will need customSerializeor a default-elision pass over a baselineSettings. Pick an approach or call this out as a known cost. - Bootstrap "store reference" is platform-undefined (lines 222–226): Fastly Compute references config stores by name at the WASM guest. The bootstrap section should at least state: on Fastly the store-name is a fixed constant compiled into the adapter, and
fastly.tomlmust declare ats-config[local_server.config_stores]for dev. As written the "bootstrap contract" is not actionable for the only currently supported platform. deny_unknown_fieldsis a behavioral break with no migration path (lines 278–286):Settingsdoes not currently use#[serde(deny_unknown_fields)]. The new contract fail-closes any operator TOML with stale or typo'd keys. Spec should explicitly state (a) this is a breaking change vs. existing operator configs and (b) theintegrationscarve-out applies only to per-integration payload internals — the dynamic top-level[integrations.<id>]keyspace itself is not denied.- Per-request load cost forecloses straightforward optimization (lines 198–213): "Correctness does not depend on cross-request in-memory persistence" plus Fastly's per-request guest model means a worst-case implementation does TOML parse + per-handler regex compile (
prepare_runtime()) every request. The spec allows in-memory caching as a perf optimization but forbids it for correctness. Worth flagging as a hard implementation constraint so readers don't underestimate cost.
❓ question
- Source-selection signal is unspecified (lines 249–258): Step 1 says "Select the source by environment" but never describes the runtime mechanism — cargo feature, env var, Fastly service-ID check, bootstrap flag? This is the central pipeline decision and is undefined. Either pin the mechanism or explicitly defer with a marker.
- Secrets migration is missing:
trusted-server.tomlcarriesedge_cookie.secret_keyandpublisher.proxy_secret(crates/trusted-server-core/src/settings_data.rs:43-57), overridden today viaTRUSTED_SERVER__EDGE_COOKIE__SECRET_KEYandTRUSTED_SERVER__PUBLISHER__PROXY_SECRETat build time. Removing build-time env merging implies either (a) plaintext secrets in the (unencrypted) Fastly Config Store, or (b) splitting secrets intoPlatformSecretStoreand referencing them by name fromSettings. The spec is silent — which is the intended model?
Non-blocking
🤔 thinking
- "Production vs development" runtime distinction is largely cosmetic (lines 187–190): Both paths read
ts-configfrom a (real or simulated) config store. The actual difference is who populates the store, not how the runtime reads it. Collapsing to one runtime path plus a tooling pre-flight step would simplify the doc. - Fastly Config Store value-size limit is not addressed: Fastly caps each value (~8 KB historically). Current canonical TOML is ~3.4 KB; source is ~6.8 KB. Future growth could exceed the limit. One line in "Implementation notes" surfaces this at design time.
- Multi-tenant constraint of fixed
ts-configkey (lines 235–243): Hard-coding the key globally precludes per-tenant config via key multiplexing in a shared binary. Likely intentional — call it out so future cross-platform abstractions don't assume key namespacing.
♻️ refactor
- Cross-link to existing attestation spec: Spec mentions "future attestation" four times.
docs/superpowers/specs/2026-01-15-attestation-design.mdalready exists. Add a "Related work" section linking it. - Name the existing platform abstraction:
crates/trusted-server-core/src/platform/traits.rs:12definesPlatformConfigStore. Spec says "existing generic config-store abstractions" but never names it. Naming makes implementation actionable. - Specify how the dev
ts-configpayload reaches Viceroy (lines 418–422): Viceroy reads config-store contents from[local_server.config_stores.<name>.contents]infastly.toml(fastly.toml:42-49). Either dev tooling regenerates that section perfastly compute serve, or an alternate loading mechanism. Worth one paragraph.
🌱 seedling
- Note
/healthcarve-out from fail-closed:crates/trusted-server-adapter-fastly/src/main.rs:55-61short-circuits/healthbefore settings load. Worth a one-liner so a future reader doesn't "fix" this as inconsistent with fail-closed semantics.
📌 out of scope
- Test fixtures + CI depend on tracked
trusted-server.toml: Removing it touchescrate_test_settings_str()consumers and 10+ tests incrates/trusted-server-core/src/settings.rs. Open a follow-up issue tracking the test/CI migration before implementation lands.
⛏ nitpick
- Add Author line: Peers use
> **Author:**. This spec omits it. - Inconsistent terminology: "platform config store" / "platform config-store" / "platform-store" used interchangeably. Pick one.
- Bullet phrasing at line 314:
do **not** expand…mid-sentence reads oddly inside the numbered list.
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- format-typescript: PASS
- format-docs: PASS
- Analyze (rust / actions / js-ts): PASS
- browser integration tests: PASS
- integration tests: PASS
- CodeQL: PASS
| This section defines the semantic pipeline for both production and development. | ||
|
|
||
| ### Step 1: Select the source by environment | ||
|
|
There was a problem hiding this comment.
❓ question — Source-selection signal is unspecified. Step 1 says "Select the source by environment" but never describes the runtime mechanism. Cargo feature? Env var read at module init? Fastly service-ID check? Bootstrap config flag? This is the central pipeline decision and is undefined. Either pin the mechanism here or defer explicitly (e.g., "to be defined in the tooling spec, but must be a compile-time signal").
| The config hash is computed over the canonical TOML bytes. | ||
|
|
||
| This provides a stable hashable representation of application config suitable | ||
| for observability and future attestation work. |
There was a problem hiding this comment.
🔧 wrench — Hash algorithm not specified despite explicit "future attestation" goal. The doc commits to a config hash over canonical TOML bytes but never names SHA-256 vs BLAKE3 vs SHA-3, nor the encoding (hex / base64 / raw). Choosing here locks in the attestation contract in 2026-01-15-attestation-design.md. Pin the algorithm + encoding (suggest SHA-256, hex) or explicitly defer to the attestation spec with a forward reference.
| - do **not** expand the config into a full dump of all effective defaulted runtime values | ||
| - define stable ordering for map-like structures so identical semantic config | ||
| produces identical canonical bytes | ||
|
|
There was a problem hiding this comment.
🔧 wrench — Canonical TOML form is underspecified for hash stability. "Define stable ordering for map-like structures" leaves the following undefined and stability-critical:
- key ordering rule (lexicographic vs schema-declared vs insertion)
- string flavor (basic
"..."vs literal'...') - numeric formatting (
1vs1.0) - table vs inline-table normalization
- array-of-tables vs inline-array
- Unicode normalization of string values
Two correct implementations (or two toml-rs versions) can produce different bytes from the same Settings and break hash stability. Either enumerate the rules explicitly or pin a canonicalizer crate + version.
| - serialize it in a deterministic TOML form | ||
| - include explicitly declared settings only | ||
| - do **not** expand the config into a full dump of all effective defaulted runtime values | ||
| - define stable ordering for map-like structures so identical semantic config |
There was a problem hiding this comment.
🔧 wrench — "Explicitly declared settings only" contradicts the existing Serialize round-trip. Today crates/trusted-server-core/build.rs:44-45 does toml::to_string_pretty(&settings), which includes defaults. The new contract requires a "provided vs defaulted" distinction that #[serde(default)] erases. Implementers will need either custom Serialize impls per type or a default-elision pass that compares against a baseline Settings. Pick an approach or call this out as a known cost.
| Production bootstrap must provide: | ||
|
|
||
| - the platform config store reference needed to open/read the store | ||
|
|
There was a problem hiding this comment.
🔧 wrench — Bootstrap "store reference" is platform-undefined. Fastly Compute references config stores by name at the WASM guest level — the names live in fastly.toml, not in environment. The spec should at least state:
- On Fastly, the bootstrap form is a fixed store-name constant compiled into the adapter.
fastly.tomlmust declare ats-config[local_server.config_stores]entry for dev.
As written, the bootstrap contract is not actionable for the only currently supported platform.
|
|
||
| This spec does not define a special "healthy but unusable" mode for config | ||
| failure. | ||
|
|
There was a problem hiding this comment.
🌱 seedling — Note /health carve-out from fail-closed. crates/trusted-server-adapter-fastly/src/main.rs:55-61 short-circuits /health before settings load, which is a deliberate carve-out from the fail-closed model defined here. A one-liner acknowledging this would prevent a future reader from "fixing" the early-return as a bug.
| - tracked template file | ||
| - kept in sync with currently supported configuration features | ||
| - intended to help operators create a real `trusted-server.toml` | ||
|
|
There was a problem hiding this comment.
📌 out of scope — Test fixtures + CI depend on tracked trusted-server.toml. Removing it touches crate_test_settings_str() consumers and 10+ tests in crates/trusted-server-core/src/settings.rs, plus CI workflows that rely on the file existing at the repo root. Worth a follow-up issue tracking the test/CI migration before this becomes implementable.
|
|
||
| > **Status:** Proposal | ||
| > **Scope:** Config store architecture only | ||
| > **Date:** April 2026 |
There was a problem hiding this comment.
⛏ nitpick — Peer specs (e.g., 2026-01-15-attestation-design.md) include an Author: line. Adding one keeps consistency.
| - **Development:** local file at repo root by default | ||
|
|
||
| The authoritative production payload is stored in the platform config store under | ||
| key `ts-config`. |
There was a problem hiding this comment.
⛏ nitpick — Inconsistent terminology. "platform config store" / "platform config-store" / "platform-store" / "config-store payload" / "platform store payload" appear interchangeably. Pick one form for the doc.
| - parse valid config through the typed config model | ||
| - serialize it in a deterministic TOML form | ||
| - include explicitly declared settings only | ||
| - do **not** expand the config into a full dump of all effective defaulted runtime values |
There was a problem hiding this comment.
⛏ nitpick — do **not** expand the config into a full dump of all effective defaulted runtime values reads oddly mid-sentence inside this numbered list. Consider promoting it to its own bullet.
aram356
left a comment
There was a problem hiding this comment.
Summary
Second-pass review on the same SHA (1ff07c95) — the PR has had no new commits since the prior CHANGES_REQUESTED review on Apr 30, so all 18 prior findings still stand verbatim. This review adds findings the prior pass missed and elevates one finding that re-reading exposes as a contradiction rather than a footnote.
The load-bearing problem: the explicit downstream goal is attestation, which requires hash determinism. The spec demands hash stability but leaves the four ingredients open — algorithm, map-key ordering, EOL convention, "explicitly declared settings only" mechanism. Two correct independent implementations of this spec will produce different hashes today.
Blocking
🔧 wrench
- Availability section contradicts existing
/healthcarve-out:crates/trusted-server-adapter-fastly/src/main.rs:53-60already returns200 okbefore settings load — that is a "healthy but unusable" mode the spec text claims it does not define. (line 376) HashMapmap-typed sections produce non-deterministic canonical bytes:IntegrationSettings.entries(settings.rs:79) andresponse_headers(settings.rs:415) — concrete code paths the abstract canonical-ordering bullet has to address. (line 318)
Non-blocking
🤔 thinking
- Integration carve-out weakens fail-closed-on-typo: no integration config carries
deny_unknown_fieldstoday; typos under[prebid],[google_tag_manager], etc. still pass silently. Either harden integrations or move the limitation into Non-goals. (line 286) - EOL and trailing-newline unspecified for canonical bytes: CRLF/LF normalization and trailing-newline convention both affect the hash; both undefined. (line 322)
📝 note
trusted-server.example.tomldoes not exist today; "Add or retain" should just say "Add". (line 387)- Failure-behavior list reads exhaustive but isn't: missing store timeouts, partial reads, transient unavailability. (line 358)
🌱 seedling
- Test path is implicit in source selection:
crate_test_settings_str()flows through neither production nor development as defined. One-line acknowledgement under Step 1. (line 257)
CI Status
- fmt: PASS
- clippy: PASS
- rust tests: PASS
- js tests: PASS
- format-docs: PASS
| A config failure means the service is not healthy for serving application | ||
| traffic. | ||
|
|
||
| This spec does not define a special "healthy but unusable" mode for config |
There was a problem hiding this comment.
🔧 wrench — Availability section directly contradicts the existing /health carve-out.
Spec text says:
A config failure means the service is not healthy for serving application traffic. This spec does not define a special "healthy but unusable" mode for config failure.
But crates/trusted-server-adapter-fastly/src/main.rs:53-60 already short-circuits /health and returns 200 ok before get_settings() runs. That is a deliberate "healthy but unusable" mode. An implementer reading this section in isolation would (correctly per the spec) delete the carve-out, breaking external readiness probes.
Suggested rewording:
A config failure means the service is not healthy for serving *application* traffic.
Liveness probes (e.g. `/health`) are explicitly carved out from this rule and remain available regardless of config-load state. This is a deliberate liveness-vs-readiness split, not a hidden fallback path.
This was a 🌱 in the prior pass — re-reading, the spec text is an active contradiction, not a footnote. Elevating.
| - do **not** expand the config into a full dump of all effective defaulted runtime values | ||
| - define stable ordering for map-like structures so identical semantic config | ||
| produces identical canonical bytes | ||
|
|
There was a problem hiding this comment.
🔧 wrench — Map-typed sections in Settings use HashMap, whose serde serialization is order-unstable. The current canonicalization wrench is abstract; this is the concrete code path it has to address.
crates/trusted-server-core/src/settings.rs:79—IntegrationSettings { entries: HashMap<String, JsonValue> }. Every integration block flows through this.crates/trusted-server-core/src/settings.rs:415—response_headers: HashMap<String, String>. Used by every handler-config entry.
toml::to_string_pretty(&settings) (still the build.rs round-trip on main) emits these in iteration order, which is non-deterministic across runs. Two consecutive canonicalizations of the same Settings produce different bytes today.
Suggested addition under Step 6:
- map-like fields (`HashMap` in current `Settings`) MUST be serialized in lexicographic key order. Non-deterministic iteration of the underlying collection is a canonicalization bug, not an implementation choice.
This turns the abstract "stable ordering for map-like structures" bullet into something an implementer can grep for and verify.
| This applies to strongly typed configuration sections in the application config | ||
| schema. The `integrations` section continues to follow the existing integration | ||
| configuration model, where integration IDs are discovered dynamically and each | ||
| integration's typed validation rules govern the contents of its own config. |
There was a problem hiding this comment.
🤔 thinking — The integration carve-out makes the fail-closed-on-typo guarantee weaker than the spec implies.
This paragraph exempts [integrations.*] from deny_unknown_fields, deferring strictness to each integration's typed config. Repo-wide grep on main:
grep -rn 'deny_unknown_fields' crates/trusted-server-core/src/
# (no matches)
No integration carries deny_unknown_fields today. So a typo in [prebid].tiemout_ms or [google_tag_manager].containner_id still passes silently — the very class of bug the rest of this spec is trying to close.
Two consistent options:
- Require integration
Deserializes to also carrydeny_unknown_fields. Hardens the contract to the integration boundary. - Add a Non-goal stating that the strict-parse contract stops at the integration boundary, and document the implication.
As written, the spec reads stronger than what the schema can deliver.
| As a consequence: | ||
|
|
||
| - comments are not preserved in canonical form | ||
| - original formatting is not preserved in canonical form |
There was a problem hiding this comment.
🤔 thinking — Canonical-bytes line-ending and trailing-newline conventions are unspecified. Both materially affect the hash.
- An operator authoring
trusted-server.tomlon Windows produces CRLF; the runtime canonicalizer reading that file and re-serializing viatoml::to_string_prettyproduces LF. Round-trip changes the hash. toml::to_string_prettydoes not append a trailing newline. Tooling pipelines often do (echo "$canon" > file). The CLI hash and runtime hash drift on a single byte.
Suggested addition to Step 6:
- canonical bytes use LF line endings only (any CR/CRLF in source is normalized at canonicalization time)
- canonical bytes terminate with exactly one trailing LF
The kind of ambiguity that surfaces three months in as "the CLI computes a different hash than the runtime."
|
|
||
| - Remove tracked `trusted-server.toml` from source control | ||
| - Add `trusted-server.toml` to `.gitignore` | ||
| - Add or retain `trusted-server.example.toml` as a tracked template file |
There was a problem hiding this comment.
📝 note — trusted-server.example.toml does not exist in the repo today. git ls-files | grep example returns nothing; only trusted-server.toml is tracked. "Add or retain" should just read "Add" (or "Create").
| choose a different file | ||
|
|
||
| This document does not define a separate first-class runtime `config_source` | ||
| mode. It defines source behavior in terms of production vs. development. |
There was a problem hiding this comment.
🌱 seedling — Test path is implicit in the production-vs-development source-selection model.
The dominant call site for Settings numerically is the test fixture crate_test_settings_str() → Settings::from_toml(...) (30+ tests across settings.rs, auth.rs, auction/orchestrator.rs, integrations). Tests flow through neither production nor development as defined here.
A one-line acknowledgement under Step 1 — something like "unit tests construct Settings directly and bypass source selection by design" — would prevent a future contributor from "fixing" the test path to satisfy the source-selection contract.
| - unknown fields | ||
| - missing required fields | ||
| - semantic validation failures | ||
|
|
There was a problem hiding this comment.
📝 note — Failure-behavior list reads as exhaustive but isn't. Missing cases:
- store read timeouts (Fastly's config-store API can stall)
- partial / truncated reads
- transient store unavailability (retryable vs not)
Either prefix the list with "examples include" (current text says "includes") or add these. Failure-mode docs are load-bearing for fail-closed contracts; the implementer needs to know whether retry-on-timeout is a fallback (disallowed by line 361-362) or just normal I/O.
Summary
Closes #665
Testing