Skip to content

chore_ubi_reader_search_after_pagination: chore_ubi_reader_search_after_pagination — exact full-traffic UBI aggregation #399

@SoundMindsAI

Description

@SoundMindsAI

Why

UbiReader._scan_ubi_events / _scan_ubi_queries each issue ONE
search_batch (a size-limited query). To stay under the engine
result-window they now cap at 10000 rows per (target, window). For a
dense cluster (millions of events/month), that's a sample, not the
full traffic — CTR/dwell ratings are derived from the first 10k matching
events rather than all of them. The module docstring + DEFAULT_MAX_EVENTS
document this.

Status

What this delivers

Replaces UbiReader's 10 000-row sample cap with exact full-traffic aggregation of UBI events (and queries) over a window — so CTR/dwell judgments on a dense cluster reflect all matching traffic, not just the first 10k.

  • A new engine-neutral SearchAdapter.scan_all + close_scan returning an opaque, round-tripped cursor (the reader never branches on engine — keeps Absolute Rule infra(foundation): bootstrap MVP1 stack — Docker + FastAPI + /healthz + Alembic + CI (#3) #4 intact):
    • ES + OpenSearch: Point-in-Time + search_after over [timestamp, _shard_doc] (one impl, internal branch for the ES-vs-OpenSearch PIT endpoint); rotates the PIT id + renews keep_alive each page; narrow fallback (405/501/400-unsupported only) when PIT is unavailable, and a sampled fallback when there's no safe tiebreaker — never an unsafe _id sort.
    • Solr: cursorMark over a uniqueKey-terminated sort.
  • UbiReader loops scan_all, aggregates incrementally, and closes the cursor in finally (no PIT leak on error or on ceiling early-exit).
  • A configurable scan ceiling sourced from Settings, applied at every caller path; truncation is logged (ubi_reader_scan_truncated), never silent.
  • Large query_id sets are chunked so no oversized Solr URL / ES terms filter is emitted.

Definition of done

  • scan_all + close_scan are in the SearchAdapter Protocol + shape test (AC-1)
  • ES/OpenSearch paginate the full stream via PIT + search_after with id-rotation + keep_alive (AC-2, AC-2b), OpenSearch PIT endpoint (AC-10), narrow fallback (AC-3/AC-11), sampled fallback without an unsafe _id sort (AC-3b)
  • Solr paginates via cursorMark (AC-4)
  • UbiReader aggregates across all pages = full traffic (AC-5); ceiling enforced exactly even when not page-aligned (AC-13) and logged (AC-6)
  • PIT never leaked on mid-scan error (AC-8) or ceiling early-exit (AC-9); read-only invariant holds incl. PIT open/close (AC-7)
  • Settings ceiling applies on every caller path (AC-12); large query_id sets chunked, no oversized filter (FR-7/AC-14)
  • No migration; AC-1..AC-14 green in CI + the rung-3 real-engine paginated scenario (ES + Solr)

Artifacts

How to execute

The spec + implementation plan are written and GPT-5.5-reviewed (adjudication logs in the artifacts). This item is ready to implement — run:

/impl-execute docs/00_overview/planned_features/02_mvp2/chore_ubi_reader_search_after_pagination/implementation_plan.md --all

If the linked spec/plan looks stale against main, run /impl-plan-gen (plan-accuracy audit) on docs/00_overview/planned_features/02_mvp2/chore_ubi_reader_search_after_pagination/implementation_plan.md first.

Notes

This issue is part of the MVP2 backlog issue-coverage sweep (2026-06-02) — every active MVP2 folder should have a tracking issue so external contributors can discover the work without grep-ing the planned-features tree. If you pick this up, drop a comment so others don't duplicate; if you find the linked idea/spec stale, run /idea-preflight first to refresh it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    mvp2MVP2 backlog itempriority/P2P2 — important to file, not blockingready-to-executeHas approved spec + impl plan; ready for /impl-executetype/choreChore — non-feature cleanup

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions