Build searchable-driven search-doc generation + parity harness + baseline#5351
Conversation
`searchDocFromFields(instance)` derives search-doc link depth from the explicit
`searchable` field annotation instead of from what the render loaded, and loads
the named link targets itself (targeted loading) rather than relying on store
residency. Parallel to `searchDoc`, which stays authoritative until the cutover.
Routes are dotted paths rooted at the indexed card's link fields; depth is
governed entirely by the annotations on the card being indexed — a card pulled
in as a link target does not re-consult its own `searchable`. `true` makes the
self link searchable, a dotted path makes a deeper link searchable, arrays
combine. Cycle clipping, `{ id }` for unfollowed / broken / not-loaded links,
linksToMany id normalization, and the query-backed-field skip are preserved
from the store-driven path; link targets enumerate their declared type, which
drops the unqueryable polymorphic-subtype bloat.
Integration tests verify the routes-come-only-from-the-indexed-card semantic
(a target's own searchable is dormant when pulled in; honored only when the
target is itself indexed; a dotted route on the indexer expands the deeper
link; an unannotated link stays `{ id }`).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ontains-routing
Thread the owner's store through `searchableQueryableValue` instead of
re-deriving it per value: a contained FieldDef value may not be
store-associated, but a link reached through it must still load against the
owner's store. Guard targeted loading against thrown rejections (not just
returned CardErrors) so a missing / unloadable target degrades to `{ id }`.
Tests add the cycle-clip, missing-target `{ id }`, and contains-routing cases
alongside the existing routes-only-from-the-indexed-card coverage.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y-diff tool
Differential parity test: the searchable-driven path follows a searchable link
to the same target, with the same contained data, that the store-driven render
loaded. Whole-doc byte-equality is intentionally not asserted yet — the new
spec keeps `{ id }` for every relationship while the store-driven path omits
unused links via `usedLinksToFieldsOnly`; reconciling that to an identical doc
is the cutover's gate, after the migration reproduces today's depth.
Declared-type test: a `linksTo(SimpleAuthor)` whose instance is a FancyAuthor
subtype drops the subtype-only field — the generator enumerates the declared
target type.
searchable-parity-diff.ts: the realm-scale before/after validator — diffs a
realm's live store-driven search docs against the searchable-driven output per
card, ignoring `_cardType` and (optionally) the intended shallow-link
difference. Meaningful post-migration; the data-gathering is documented inline.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1b054e4c39
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Preview deploymentsHost Test Results 1 files 1 suites 2h 32m 37s ⏱️ Results for commit 0f3ce37. Realm Server Test Results 1 files ±0 1 suites ±0 9m 48s ⏱️ +32s Results for commit 0f3ce37. ± Comparison against earlier commit 41f434e. |
… links as ignorable
Resolve a not-loaded link's reference against the owner's relativeTo before the
targeted load/lookup (matching the lazy link getter): a relative `links.self`
like `./hassan` can't be `toURL`'d by the store, which would otherwise degrade
an expandable searchable link to `{ id }`.
In the staging parity-diff tool, recognize an array whose elements are all
shallow (and the empty plural) as a shallow link, so `--ignore-shallow-links`
also covers unrendered `linksToMany` relationships.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Covers the relative-`links.self` path — without resolving the reference first
the targeted load fails and the link degrades to { id }.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new searchable-driven search-doc generator (searchDocFromFields) alongside a parity harness (host integration tests + realm-scale diff script) to validate behavior before the eventual cutover from the existing store-driven searchDoc.
Changes:
- Introduces
searchDocFromFields(instance)inpackages/base/card-api.gts, generating search docs by followingfield.searchableroutes and targeted-loading link targets. - Adds a host integration test suite covering route semantics, cycle clipping, missing/broken links, declared-type link enumeration, and a differential parity check.
- Adds a realm-server script to diff live store-driven docs against generated docs, with an option to ignore known shallow-link differences.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| packages/base/card-api.gts | Adds searchDocFromFields and the searchable-driven recursion + targeted loading logic |
| packages/host/tests/integration/searchable-search-doc-test.gts | Adds integration coverage for searchable-driven search-doc generation and parity expectations |
| packages/host/tests/helpers/base-realm.ts | Exposes searchDoc and searchDocFromFields from the base realm test helper |
| packages/realm-server/scripts/searchable-parity-diff.ts | Adds a CLI script to diff live boxel_index.search_doc output against generated docs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Postgres jsonb and JS object construction can emit the same data with different key order; a plain JSON.stringify comparison reported those as false divergences. Serialize with sorted keys at every level before comparing (array order preserved). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the generator unit matrix — an array `searchable` on a link expands each named route on the target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Move searchDocFromFields + helpers out of card-api.gts into a dedicated
packages/base/searchable.ts module; card-api re-exports it.
- Accept `searchable: false` (explicit "not searchable"); harden both the
generator's route seeding and the definition-build validator against
malformed annotations (false / null / non-string array entry / empty string
/ empty array) so they degrade to {id} instead of throwing.
- Rebuild the integration suite into 39 cases: the four contains/containsMany ×
linksTo/linksToMany combinations, multi-segment and shared-ancestor routes,
linksToMany broken/missing/empty/deep/cycle, multi-card cycles, query-backed
skip, declared-type enumeration, and malformed/impossible paths.
- searchable-parity-diff: use safe-stable-stringify; still report a changed
reference id under --ignore-shallow-links (only omit-vs-keep-{id} is
suppressed); export the differ functions + add a unit test; guard main() so
importing it for tests runs no file I/O.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CXfbXyUtT29nsUhSLtMQVX
- Remove the searchDocFromFields re-export from card-api. It pulled the
non-authoritative generator into every card's dependency closure (caught by
realm-indexing's dep-closure assertion). The generator is indexer-side
tooling; consumers import it directly from ./searchable, which also removes
the card-api↔searchable import cycle.
- Revert the searchable option to `true | string | string[]` — there is no
"not searchable" value. `false` is treated purely as bad input: route
seeding and the definition-build validator both degrade it to {id} without
throwing.
- Test helper loads searchDocFromFields from the searchable module directly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CXfbXyUtT29nsUhSLtMQVX
Builds the searchable-driven search-doc generator and the parity tooling, without making it authoritative — store-driven
searchDocstays in charge of production indexing until the cutover (CS-11724). Stacked on #5350 (CS-11721).Generator —
searchDocFromFields(instance)(packages/base/card-api.gts)Parallel to
searchDoc. Derives link depth from the explicitsearchableannotation instead of from what the render loaded, and loads the named link targets itself (targeted loading) rather than relying on store residency.searchable.truemakes the immediate ("self") link searchable; a dotted path makes a deeper (n+1) link searchable; arrays combine.FieldDefvalue may not be store-associated, but a link reached through it must still load against the owner's store). Targeted loading degrades a missing/broken/unloadable target to{ id }.{ id }for unfollowed/broken/not-loaded links,linksToManyid normalization, and the query-backed-field skip are preserved from the store-driven path. Link targets enumerate their declared type, dropping unqueryable polymorphic-subtype bloat.Parity
packages/host/tests/integration/searchable-search-doc-test.gts): the searchable-driven path follows a searchable link to the same target, with the same contained data, the store-driven render loaded. Whole-doc byte-equality is intentionally not asserted here — the new spec keeps{ id }for every relationship while the store-driven path omits unused links viausedLinksToFieldsOnly; reconciling that to an identical doc is the cutover's gate, after the migration (CS-11723) reproduces today's depth.searchable-parity-diff.ts(packages/realm-server/scripts): the realm-scale post-migration validator — diffs a realm's live store-driven search docs (boxel_index) against the searchable-driven output per card, ignoring_cardTypeand (optionally) the intended shallow-link difference. Data-gathering documented inline.ctse/military-pigeonrender-vs-searchDoc split captured from staging (≈ 933:1) and recorded on the ticket for the follow-on (prerender off the hot path) to measure against.Tests (host integration, run in CI)
searchable-search-doc-test.gts— 9 cases: the routes-come-only-from-the-indexed-card trio (target's ownsearchabledormant when pulled in; honored only when itself indexed; a dotted route on the indexer drives the deeper expansion), self link, n+1 route, contains-routing through a contained value, cycle clip →{ id }, missing/broken link →{ id }, unannotated link →{ id }, declared-type enumeration drops subtype bloat, and the differential expansion-parity check.Verified: glint +
tscclean (base/host/runtime-common/realm-server), eslint clean, and all 9 integration tests green on a livetest-services:hoststack.