Skip to content

Phase0 6/bench harness tiers#300

Merged
thorrester merged 5 commits into
mainfrom
phase0_6/bench-harness-tiers
May 14, 2026
Merged

Phase0 6/bench harness tiers#300
thorrester merged 5 commits into
mainfrom
phase0_6/bench-harness-tiers

Conversation

@thorrester
Copy link
Copy Markdown
Member

@thorrester thorrester commented May 14, 2026

Pull Request

Short Summary

Adds a tiered OLAP benchmark harness with committed Tier 0 baseline artifacts and wires Phase 0 observability instrumentation through the full trace query stack — object store → Delta table lifecycle → DataFusion query pipeline → HTTP route handlers. The harness is the mechanism that makes the instrumentation measurable at PR time.

Context

Tiered benchmark harness (benches/tiers.rs, 1077 lines)

New BenchTier enum (Tier0/1/2) gates every benchmark group via tier_guard_for(bench, group), which reads SCOUTER_BENCH_TIER from env. Three make targets drive the tiers:

  • make bench.core — Tier 0, PR-blocking smoke suite (~50–120s per group)
  • make bench.extended — Tier 1, CI-gated extended load tests
  • make bench.certification — Tier 2, full certification against object storage

Each benchmark binary that was previously monolithic now calls tier_guard_for at the top of every group function, so a Tier 0 run only executes T0 groups and exits fast. Criterion still does the measuring; the tier system just controls which groups run and writes JSON artifacts to bench_metrics/.

Committed Tier 0 baseline artifacts (bench_metrics/*.json)

Four JSON files are checked in as the initial Tier 0 baseline:

Artifact Entrypoint p50 end-to-end Result rows
t0_bifrost_smoke dataset_engine_manager.query 927 µs 256
t0_cold_query_smoke trace_query_service.query_spans 3,498 µs 5
t0_hot_path_cold_query_smoke trace_query_service.query_spans 4,059 µs 5
t0_refresh_origin_sentinel 0 (guards refresh accounting)

make bench.core also runs bench_compare, a binary that loads these committed artifacts and fails if the current run regresses on bench.query.end_to_end or violates object-store operation counts. The sentinel exists specifically to assert that the refresh-on-request path produces zero LIST calls during normal query execution.

Object store observability (parquet/utils.rs, ~480 new lines)

ObjectStoreRequestTelemetry is the production instrumentation primitive. Every object store call now gets an object_store.request span with:

  • object_store.operationlist, get, get_range, head, put, delete, copy
  • object_store.path_kinddelta_log, parquet_data, checkpoint, unknown
  • object_store.backendlocal, s3, gcs, azure, cache
  • object_store.statusok, error
  • object_store.cache.hittrue/false/unknown

Three Prometheus counters accompany the spans: scouter_trace_object_store_requests_total, scouter_trace_object_store_request_duration_ms, scouter_trace_object_store_bytes_total.

observe_object_meta_stream wraps the lazy list() stream so object metadata is counted as it arrives, not after collection.

CachingStore instrumentation (caching_store.rs)

CachingStore previously passed through to the inner store silently. Every method (put_opts, put_multipart_opts, get_opts, get_range, delete_stream, list, list_with_delimiter, copy) now wraps with ObjectStoreRequestTelemetry, including the cache-hit path which records cache.hit = true without an inner span.

Delta table lifecycle spans (engine.rs)

Five named spans now wrap the Delta table operations that were previously invisible:

Span When
delta.table.load Table probe at startup, existing-table open
delta.snapshot.refresh refresh_table() background tick
update_incremental Delta log catch-up inside refresh
delta.catalog.swap Provider swap after write, optimize, vacuum, expire, refresh
delta.optimize ZOrder compaction

Query pipeline spans (queries.rs)

TraceQueryBuilder operations now emit spans that expose where query time goes:

df.table.resolvedf.logical.builddf.physical.plandf.collectarrow.converttrace.tree.build

All carry endpoint and table attributes so you can filter by query type in Jaeger or Grafana.

Phase 0 observability contract (scouter_tracing/src/tracer.rs)

phase0_observability is a new public module that centralizes route constants, span name constants, and attribute key constants. A PHASE0_SPAN_NAMES BTreeSet constant and two contract tests (phase0_span_names_are_complete_and_unique, phase0_route_contract_preserves_in_scope_trace_endpoints) will fail at compile/test time if any name is renamed or dropped without updating the contract.

HTTP route spans (trace/route.rs)

All five trace handlers (paginated_traces, get_trace_spans, get_trace_spans_by_id, trace_metrics, v1_otel_traces) now declare 17 Phase 0 span fields upfront (trace.query.endpoint, trace.query.kind, trace.query.window_ms, trace.query.result.rows, trace.query.cache.hit, etc.) and call record_trace_query_common + record_trace_query_result to fill them at runtime.

Minor fixes

  • TransportConfig::is_mock() replaces the SCOUTER_OFFLINE env-var check in py_queue.rs — the offline guard was incorrectly tied to an env var rather than the configured transport type.
  • AgentEvalProfile::reset_workflow_agents() extracts the workflow reset logic so py_queue.rs can call it without reaching into the workflow field directly.
File Change
benches/tiers.rs New tiered harness: BenchTier, tier_guard_for, JSON artifact writer, bench_compare entry point
bench_metrics/*.json Four committed Tier 0 baseline artifacts
benches/counting_object_store.rs ObjectStoreCounts + CountingObjectStore for bench artifact output
benches/dataset_benchmark.rs Tier guards + bench_t0_bifrost_smoke
benches/hot_path_bench.rs Tier guards + benchmark_t0_cold_query_smoke, seed_small_fixture
benches/trace_service_benchmark.rs Tier guards on all four existing groups
benches/planner_bench.rs, session_config_bench.rs, stress_test.rs Tier guards
src/parquet/utils.rs ObjectStoreRequestTelemetry, ObservingObjectStore, observe_object_meta_stream, path classifier, Prometheus counters
src/caching_store.rs Full ObjectStore method instrumentation via ObjectStoreRequestTelemetry
src/parquet/tracing/engine.rs Delta lifecycle spans: load, refresh, swap, optimize
src/parquet/tracing/queries.rs Query pipeline spans: table resolve → collect → arrow convert → tree build
src/parquet/tracing/summary.rs Summary path span coverage
crates/scouter_tracing/src/tracer.rs phase0_observability contract module + contract tests
crates/scouter_server/src/api/routes/trace/route.rs Phase 0 span fields on all five handlers, record_trace_query_common/result
crates/scouter_events/src/queue/py_queue.rs is_mock() replaces SCOUTER_OFFLINE check
crates/scouter_events/src/queue/types.rs TransportConfig::is_mock()
crates/scouter_types/src/agent/profile.rs AgentEvalProfile::reset_workflow_agents()
makefile bench.core, bench.extended, bench.certification targets

Is this a Breaking Change?

No. All new observability is purely additive — new spans, new Prometheus counters, new span attributes. Existing HTTP response shapes, public Rust API signatures, Python bindings, database schema, and config keys are unchanged.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.57%. Comparing base (597d5d7) to head (420df65).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #300   +/-   ##
=======================================
  Coverage   76.57%   76.57%           
=======================================
  Files          26       26           
  Lines         918      918           
=======================================
  Hits          703      703           
  Misses        215      215           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@thorrester thorrester marked this pull request as ready for review May 14, 2026 10:39
@thorrester thorrester merged commit a375b30 into main May 14, 2026
20 of 21 checks passed
thorrester added a commit that referenced this pull request May 14, 2026
thorrester added a commit that referenced this pull request May 14, 2026
This reverts commit a375b30.

(cherry picked from commit f294366)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants