fix(trace-stats)!: add grpc_method to aggregation key#2151
Conversation
Spans with different gRPC methods were previously merged into the same stats group (only the first span's method was kept). Adding grpc_method to FixedAggregationKey ensures each method gets a separate bucket. The OtlpExactGroup.grpc_method field is now sourced from the key rather than a GroupedStats sidecar. The agent /v0.6/stats protobuf wire format is unchanged (no grpc_method field in ClientGroupedStats). SHM_VERSION bumped to 2 because FixedAggregationKey<StringRef> is #[repr(C)] and the new field changes the layout; mismatched sidecar/worker pairs will safely fail with a version-mismatch error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be18c4394d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 75dd3a3 | Docs | Datadog PR Page | Give us feedback! |
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
Bincode encodes struct fields positionally. Inserting grpc_method before http_status_code shifted all subsequent field positions, breaking IPC fallback decoding (OwnedShmSpanInput) between mismatched worker/sidecar versions. Moving it to the end of the struct preserves all existing field positions, so old-format IPC messages are decoded correctly up to the grpc_method field; the decode then fails with EOF rather than silently misinterpreting existing fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| } | ||
|
|
||
| pub(super) fn get_grpc_method<'a>(span: &'a impl StatSpan<'a>) -> &'a str { | ||
| fn get_grpc_method<'a>(span: &'a impl StatSpan<'a>) -> &'a str { |
There was a problem hiding this comment.
This method is currently unreleased (so this change should be safe)
| error_max: u64, | ||
| // gRPC method for OTLP export only; not part of the aggregation key so agent stats are | ||
| // unaffected. | ||
| pub(super) grpc_method: String, |
There was a problem hiding this comment.
This change is also unreleased
VianneyRuhlmann
left a comment
There was a problem hiding this comment.
Some concerns regarding cardinality for non-otlp exporter otherwise LGTM
| .into_key(), | ||
| ), | ||
| // grpc.method.name is carried in GroupedStats (for OTLP), not in the aggregation key. | ||
| // grpc.method.name is part of the aggregation key. |
There was a problem hiding this comment.
I don't think this comment is needed anymore
| http_method, | ||
| http_endpoint, | ||
| service_source, | ||
| grpc_method, |
There was a problem hiding this comment.
This means that we will aggregate based on grpc_method even when not using OTLP (increasing cardianlity). A given concentrator is either always used for OTLP or not at all right ? Then we could add a config option to the concentrator to ignore this field when not in otlp
| pub grpc_status_code: Option<u8>, | ||
| pub is_synthetics_request: bool, | ||
| pub is_trace_root: bool, | ||
| // Appended last to preserve bincode field-order compatibility with existing IPC messages. |
There was a problem hiding this comment.
You can remove this comment
What does this PR do?
Adds
grpc_method(grpc.method.name/rpc.method) toFixedAggregationKeyso spans with different gRPC methods get separate stats buckets. Previously the method was stored as a sidecar on the first span inserted into a group, so all spans sharing the same other key fields were merged regardless of their gRPC method — producing OTLP metric data points with an arbitrary method label.Also moves
grpc_methodto the last field position in the struct to preserve bincode field-order compatibility with existingOwnedShmSpanInputIPC messages.Motivation
Correctness fix for the OTLP trace metrics feature landed in #2067.
Risk
Breaking changes (major semver):
FixedAggregationKey<T>gains a publicgrpc_method: Tfield. Callers constructing the struct by name must add the new field;..Default::default()callers are unaffected.SHM_VERSIONbumped 1 → 2. The shared-memory key header is#[repr(C)]and the new field changes its layout; a mismatched sidecar/worker pair fails with a version error rather than silently reading bad data.StatsBucket::insertloses itsgrpc_methodparameter (waspub(super), internal only).Blast radius is limited because:
mainafter the v36 release (June 19). No released version is affected — this bug only exists in unreleased code.set_otlp_metrics_endpointis configured. No SDK enables this today./v0.6/statsprotobuf wire format is unchanged —ClientGroupedStatshas nogrpc_methodfield, so the stats-to-agent path is unaffected.libdd.{version}@{pid}.sock) ensures old workers connect to old sidecars; the IPC fallback (OwnedShmSpanInput) bincode risk only applies during a coordinated in-place upgrade within the same version.Urgency: low. No live production systems are affected. This must land before the next release that ships the OTLP metrics feature.
How to test the change?