You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
UbiReader._scan_ubi_events / _scan_ubi_queries each issue ONE search_batch (a size-limited query). To stay under the engine
result-window they now cap at 10000 rows per (target, window). For a
dense cluster (millions of events/month), that's a sample, not the
full traffic — CTR/dwell ratings are derived from the first 10k matching
events rather than all of them. The module docstring + DEFAULT_MAX_EVENTS
document this.
Replaces UbiReader's 10 000-row sample cap with exact full-traffic aggregation of UBI events (and queries) over a window — so CTR/dwell judgments on a dense cluster reflect all matching traffic, not just the first 10k.
ES + OpenSearch: Point-in-Time + search_after over [timestamp, _shard_doc] (one impl, internal branch for the ES-vs-OpenSearch PIT endpoint); rotates the PIT id + renews keep_alive each page; narrow fallback (405/501/400-unsupported only) when PIT is unavailable, and a sampled fallback when there's no safe tiebreaker — never an unsafe _id sort.
Solr:cursorMark over a uniqueKey-terminated sort.
UbiReader loops scan_all, aggregates incrementally, and closes the cursor in finally (no PIT leak on error or on ceiling early-exit).
A configurable scan ceiling sourced from Settings, applied at every caller path; truncation is logged (ubi_reader_scan_truncated), never silent.
Large query_id sets are chunked so no oversized Solr URL / ES terms filter is emitted.
Definition of done
scan_all + close_scan are in the SearchAdapter Protocol + shape test (AC-1)
ES/OpenSearch paginate the full stream via PIT + search_after with id-rotation + keep_alive (AC-2, AC-2b), OpenSearch PIT endpoint (AC-10), narrow fallback (AC-3/AC-11), sampled fallback without an unsafe _id sort (AC-3b)
Solr paginates via cursorMark (AC-4)
UbiReader aggregates across all pages = full traffic (AC-5); ceiling enforced exactly even when not page-aligned (AC-13) and logged (AC-6)
PIT never leaked on mid-scan error (AC-8) or ceiling early-exit (AC-9); read-only invariant holds incl. PIT open/close (AC-7)
Settings ceiling applies on every caller path (AC-12); large query_id sets chunked, no oversized filter (FR-7/AC-14)
No migration; AC-1..AC-14 green in CI + the rung-3 real-engine paginated scenario (ES + Solr)
If the linked spec/plan looks stale against main, run /impl-plan-gen (plan-accuracy audit) on docs/00_overview/planned_features/02_mvp2/chore_ubi_reader_search_after_pagination/implementation_plan.md first.
Notes
This issue is part of the MVP2 backlog issue-coverage sweep (2026-06-02) — every active MVP2 folder should have a tracking issue so external contributors can discover the work without grep-ing the planned-features tree. If you pick this up, drop a comment so others don't duplicate; if you find the linked idea/spec stale, run /idea-preflight first to refresh it.
Why
Status
What this delivers
Replaces
UbiReader's 10 000-row sample cap with exact full-traffic aggregation of UBI events (and queries) over a window — so CTR/dwell judgments on a dense cluster reflect all matching traffic, not just the first 10k.SearchAdapter.scan_all+close_scanreturning an opaque, round-tripped cursor (the reader never branches on engine — keeps Absolute Rule infra(foundation): bootstrap MVP1 stack — Docker + FastAPI + /healthz + Alembic + CI (#3) #4 intact):search_afterover[timestamp, _shard_doc](one impl, internal branch for the ES-vs-OpenSearch PIT endpoint); rotates the PIT id + renewskeep_aliveeach page; narrow fallback (405/501/400-unsupported only) when PIT is unavailable, and a sampled fallback when there's no safe tiebreaker — never an unsafe_idsort.cursorMarkover a uniqueKey-terminated sort.UbiReaderloopsscan_all, aggregates incrementally, and closes the cursor infinally(no PIT leak on error or on ceiling early-exit).Settings, applied at every caller path; truncation is logged (ubi_reader_scan_truncated), never silent.query_idsets are chunked so no oversized Solr URL / EStermsfilter is emitted.Definition of done
scan_all+close_scanare in theSearchAdapterProtocol + shape test (AC-1)search_afterwith id-rotation +keep_alive(AC-2, AC-2b), OpenSearch PIT endpoint (AC-10), narrow fallback (AC-3/AC-11), sampled fallback without an unsafe_idsort (AC-3b)cursorMark(AC-4)UbiReaderaggregates across all pages = full traffic (AC-5); ceiling enforced exactly even when not page-aligned (AC-13) and logged (AC-6)Settingsceiling applies on every caller path (AC-12); largequery_idsets chunked, no oversized filter (FR-7/AC-14)Artifacts
docs/00_overview/planned_features/02_mvp2/chore_ubi_reader_search_after_pagination/idea.mddocs/00_overview/planned_features/02_mvp2/chore_ubi_reader_search_after_pagination/feature_spec.mddocs/00_overview/planned_features/02_mvp2/chore_ubi_reader_search_after_pagination/implementation_plan.mdHow to execute
The spec + implementation plan are written and GPT-5.5-reviewed (adjudication logs in the artifacts). This item is ready to implement — run:
If the linked spec/plan looks stale against
main, run/impl-plan-gen(plan-accuracy audit) ondocs/00_overview/planned_features/02_mvp2/chore_ubi_reader_search_after_pagination/implementation_plan.mdfirst.Notes
This issue is part of the MVP2 backlog issue-coverage sweep (2026-06-02) — every active MVP2 folder should have a tracking issue so external contributors can discover the work without grep-ing the planned-features tree. If you pick this up, drop a comment so others don't duplicate; if you find the linked idea/spec stale, run
/idea-preflightfirst to refresh it.