Skip to content

Fix Phase 0.5 trace writer correctness#303

Merged
thorrester merged 1 commit into
olap-serving-layerfrom
phase0_5/correctness-writer-props
May 14, 2026
Merged

Fix Phase 0.5 trace writer correctness#303
thorrester merged 1 commit into
olap-serving-layerfrom
phase0_5/correctness-writer-props

Conversation

@thorrester
Copy link
Copy Markdown
Member

@thorrester thorrester commented May 14, 2026

Pull Request

Short Summary

Fixes the Phase 0.5 trace storage correctness issues that block bounded trace lookup. Replaces trace-id-only span caching with a query-shape cache key, centralizes Parquet writer properties, adds vacuum retention safeguards, and partitions eval scenarios by UTC created date.

Context

The old trace span cache keyed only on trace_id, which is unsafe once callers add time bounds, service filters, limits, or payload selection. This PR makes the cache key match the query shape and bypasses the cache when a query cannot be represented safely.

Before:

Cache<[u8; 16], Arc<Vec<TraceSpan>>>

After:

SpanCacheKey {
    trace_id,
    start_time_us,
    end_time_us,
    service_name,
    service_namespace,
    service_version,
    service_instance_id,
    limit,
    include_payloads,
}

The writer-property changes pull the repeated Parquet configuration into writer_props.rs. Trace spans, summaries, GenAI spans, dispatch records, Bifrost datasets, eval scenarios, and the control table now use explicit writer-property profiles instead of each engine carrying its own partial copy. Summary optimize also reapplies its writer properties during compaction, which was the known bug in the Phase 0.5 plan.

Vacuum calls now use SCOUTER_TRACE_VACUUM_RETENTION_HOURS with a default of 24 hours. retention_hours=0 is rejected unless SCOUTER_TRACE_UNSAFE_VACUUM_ALLOW_ZERO=true.

Eval scenarios now include a non-null partition_date column derived from created_at in UTC and create Delta partitions under partition_date=YYYY-MM-DD.

File Change
crates/scouter_dataframe/src/parquet/writer_props.rs Adds shared writer-property profiles for trace, GenAI, dispatch, Bifrost, eval scenarios, and control writes.
crates/scouter_dataframe/src/parquet/maintenance.rs Adds trace vacuum retention helpers and zero-retention validation.
crates/scouter_dataframe/src/parquet/mod.rs Exposes the new maintenance and writer-property modules.
crates/scouter_dataframe/src/parquet/tracing/queries.rs Replaces trace-id-only cache entries with SpanCacheKey, cache policy, weight estimation, and key-isolation tests.
crates/scouter_dataframe/src/parquet/tracing/engine.rs Uses shared trace span writer properties and validates vacuum retention.
crates/scouter_dataframe/src/parquet/tracing/summary.rs Uses shared summary writer properties for writes and optimize, and switches vacuum to the configured retention.
crates/scouter_dataframe/src/parquet/tracing/genai.rs Uses shared GenAI writer properties.
crates/scouter_dataframe/src/parquet/tracing/dispatch.rs Applies dispatch writer properties instead of leaving dispatch writes unconfigured.
crates/scouter_dataframe/src/parquet/tracing/service.rs Runs post-delete vacuum with the configured retention window.
crates/scouter_dataframe/src/parquet/bifrost/engine.rs Replaces inline Bifrost writer-property construction with the shared dynamic-schema profile.
crates/scouter_dataframe/src/parquet/control/engine.rs Applies explicit control-table writer properties.
crates/scouter_dataframe/src/parquet/eval_scenarios/engine.rs Adds partition_date, partitioned table creation/writes, schema evolution for existing local tables, and shared writer properties.
crates/scouter_dataframe/tests/eval_scenarios.rs Verifies eval scenario partition directories come from the UTC created_at date.
crates/scouter_settings/src/storage.rs Adds env readers for span cache policy and trace vacuum safety settings.

Is this a Breaking Change?

Yes. eval_scenarios gains a new physical Delta column and partition layout before the table is released, and trace cache behavior changes to include query shape in the key; existing public Rust/Python APIs are unchanged.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (olap-serving-layer@e106bf0). Learn more about missing BASE report.

Additional details and impacted files
@@                  Coverage Diff                  @@
##             olap-serving-layer     #303   +/-   ##
=====================================================
  Coverage                      ?   76.57%           
=====================================================
  Files                         ?       26           
  Lines                         ?      918           
  Branches                      ?        0           
=====================================================
  Hits                          ?      703           
  Misses                        ?      215           
  Partials                      ?        0           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@thorrester thorrester merged commit d23a972 into olap-serving-layer May 14, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants