feat(data-pipeline): OTLP HTTP/protobuf trace export#2115
Conversation
Records the approved design: vendor OTLP trace + collector protos and generate prost types (zero new runtime deps), keep the hand-rolled serde JSON path, share one mapper with a serde->prost converter, and select the protocol via builder + C FFI. Includes the dd-trace-py companion wiring and the layered E2E plan (local receiver, system-tests, sdk-backend-verify). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bite-sized, TDD-structured plan across 9 phases: vendor+generate prost types, serde->prost converter, encoder dispatch, protocol config through builder + C FFI, full validation gauntlet + libdatadog PR, then dd-trace-py PyO3/writer wiring and three E2E tiers (local receiver, system-tests, sdk-backend-verify) + dd-trace-py PR. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Vendors opentelemetry/proto/trace/v1/trace.proto and opentelemetry/proto/collector/trace/v1/trace_service.proto from open-telemetry/opentelemetry-proto commit 1e725b853bc8f6b46ee62e8232e4c83017b9536f (matching the already-vendored common.proto and resource.proto). Adds both protos to the prost_build compile list in build.rs, generates the committed Rust types (opentelemetry.proto.trace.v1.rs and opentelemetry.proto.collector.trace.v1.rs), and updates _includes.rs. Also qualifies "Span" → "pb.Span" / "pb.idx.Span" in build.rs type_attribute calls to prevent serde derives from leaking into the new opentelemetry::proto::trace::v1::Span type.
…ests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-type match Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…o avoid doctest failures
… Grpc content-type arm
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 79706e7 | Docs | Datadog PR Page | Give us feedback! |
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
ee71538 to
664f16f
Compare
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
The OTLP design spec and implementation plan are linked from the PR description (internal chonk) rather than committed to the repo. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mabdinur
left a comment
There was a problem hiding this comment.
Approval stands, these are all nits / follow-up material, nothing blocking:
-
proto_convert.rs (perf): on the http/protobuf path, mapper.rs formats every ID to a hex string and every timestamp to a string, then this code reverses it (hex_to_bytes/parse_u64) and deep-clones the whole serde tree, per span per export. Native IDs are u128/u64 already. Reasonable as a documented v1, but worth a follow-up to build the prost types straight from the native span types and skip the round-trip.
-
otlp/config.rs: OtlpProtocol is public but not #[non_exhaustive]. Nothing breaks today since it's new, but without it any future variant becomes a breaking change for downstream exhaustive matches. Worth adding now while the enum is fresh.
-
otlp_encoder/mod.rs (scope question): confirming this is traces-only, which seems like the right first step with OtlpProtocol as the encoding axis. Are metrics/logs intentionally out of scope here, handled by separate signal pipelines rather than extending this path?
-
json_types.rs (maintainability): the OTLP schema now has two hand-maintained representations (serde json_types and prost) bridged by the From impls in proto_convert, with no shared source of truth, so a field added to one side can silently diverge. The cross-encoder parity test only compares name and spanId. Could we extend it to trace_id (base64 vs raw bytes), status, and at least one attribute so drift gets caught?
-
libdd-data-pipeline-ffi/src/trace_exporter.rs: the protocol string re-parse can't fail today since the setter already restricts the values, so an error here is a silent no-op. If the accepted set ever drifts from FromStr this would quietly drop the protocol. Consider expect/log, or rely on the builder validation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t; both encodings
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… encoding, errors on grpc Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The build-time gRPC check fired unconditionally, so a `grpc` protocol (e.g. resolved from the OTel-default OTEL_EXPORTER_OTLP_PROTOCOL) failed the build of a normal Datadog-agent exporter even with no OTLP endpoint set — violating the "protocol is inert without an endpoint" contract. Move the rejection inside the OTLP-endpoint branch so it only fires when OTLP export is actually enabled. The send-time arm remains a guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…per-span alloc Pre-push review cleanups: keep the internal OtlpWireProtocol and the json_serializer module crate-private (not public API surface), and use eq_ignore_ascii_case in the span-kind mappers to avoid a per-span to_lowercase() allocation on the encode hot path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Criterion benches for the OTLP encoder hot paths — native spans -> prost IR (the mapper), and prost IR -> HTTP/protobuf and HTTP/JSON wire — plus end-to-end map+encode, over ~1000-span payloads (one large trace and many small traces). Inputs are decoded from msgpack into borrowed SpanSlices, matching the production exporter path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… hex Pre-size the per-export span Vec and per-span attribute Vec in the prost mapper to avoid reallocations as they fill. This also makes the resulting prost IR more compact, speeding up the downstream protobuf encode (a sequential read of the IR). Serialize OTLP/JSON trace/span ids from a stack buffer (hex::encode_to_slice) instead of allocating a String per id. Benches (~1000 spans): map -21%, protobuf encode -19%, JSON encode -9%, end-to-end -10..12%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ck buffer OTLP encodes 64-bit integers (nanosecond timestamps and intValue attributes) as JSON strings to avoid precision loss; these were each allocating a String per span via to_string(). Format them into a stack buffer instead (NumStr, mirroring the HexId id writer). Removes ~3 timestamp + N int-attribute String allocations per span on the JSON path. Bench: encode_json a further -4.5% (now -12.5% vs the original baseline). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s text Addresses the build.rs comment-strip review note. `[lib] doctest = false` does not suppress doctests under `cargo test --doc` (a required gate), so instead fence the one offending example block (Span.attributes in trace.proto) as a ```text block — rustdoc renders it as text, not a Rust doctest — and drop `disable_comments` so the OTLP proto docs are generated onto the prost structs again. `cargo test --doc` stays green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_otlp_grpc_without_endpoint_still_builds builds a real TraceExporter, which spawns the agent_info worker and makes syscalls miri can't execute. Add #[cfg_attr(miri, ignore)] to match the file's other live-build tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
[EDIT: you can ignore, see my later comment below].
Following @ekump 's comment, I also wonder why all the helpers in the mapper take spans by reference, while the main mapping functions actually has ownership. I feel like we could avoid most of the string clones when we're converting native spans to the IR, if we did this transformation destructively (which is fine, we have ownership)? Though I might be missing some details.
|
Ok, I'll take back the comment on owning spans. Spans text is an abstract |
…ress review
Per reviewer feedback (ekump/bengl/yannham), the public OTLP protocol no longer
carries the unsupported gRPC variant:
- OtlpProtocol is now {HttpJson, HttpProtobuf}; `grpc` is rejected at the parse
boundary (FromStr) instead of constructed and guarded downstream.
- Remove the internal OtlpWireProtocol; fold content_type()/encode() onto
OtlpProtocol (identical once gRPC was gone). The send path encodes via
OtlpProtocol::encode; the exporter derives content-type from config.protocol.
- Drop #[non_exhaustive]; remove the build-time gRPC rejection and the obsolete
builder tests, since gRPC can no longer be constructed.
- Narrow the public surface: otlp is pub(crate); only OtlpProtocol is re-exported
(libdd_data_pipeline::OtlpProtocol). OtlpTraceConfig and the mapper helpers are
pub(crate).
Also: nest push_str_attr, `(p >= 1.0) as u32`, drop a type annotation, idiomatic
iterator/.map(), doc-links, document DecimalBuf, declutter bench code, and add a
protobuf round-trip test.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Integrate the OTLP metrics export feature that landed on main with this PR's protobuf trace export + prost-IR rework: - exporter: send_otlp_http takes a content_type param (was hardcoded JSON); the trace wrapper passes config.protocol.content_type() (JSON or protobuf), metrics passes application/json. - builder: keep both the otlp_protocol field and main's metrics fields; the trace OtlpTraceConfig uses self.otlp_protocol (not hardcoded HttpJson). - mapper: port main's `_dd.stats_computed` resource attribute (from the new client_computed_stats field) onto the prost build_resource; port its two stats tests to prost assertions. - metrics depended on the deleted json_types only for status_code constants; those now live in (and are re-used from) otlp_encoder::mapper::status_code. - otlp module keeps main's pub(crate) visibility; OtlpProtocol stays the one externally re-exported symbol (libdd_data_pipeline::OtlpProtocol). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
map_span_link hardcoded `flags: 0`, dropping the linked context's W3C trace
flags (the v04 SpanLink.flags field is populated by the msgpack decoder), so
OTLP consumers saw incorrect link metadata. Set `flags: link.flags`; add
regression tests on both the protobuf (mapper) and JSON (serializer) paths.
Also from pre-push review:
- json_serializer: fix a stale comment (output-buffer pre-sizing was measured and
reverted as a regression; plain `to_vec` is intentional).
- config: make OtlpProtocol::{content_type,encode} pub(crate) so serde_json::Error
does not leak into the public API (the type stays public for the FFI).
- lib: the OtlpProtocol re-export carries a non-doc impl note.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The shared benchmark suite runs ~53 min on main against a 60 min GitLab job cap; the 10 OTLP benches (5 types x 2 fixtures) pushed it over the limit. Drop the 100x10 fixture, keeping all five bench types (map_to_prost, encode_protobuf, encode_json, e2e_protobuf, e2e_json) on the 1x1000 fixture. Both fixtures totalled ~1000 spans and gave similar signal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
yannham
left a comment
There was a problem hiding this comment.
All concerns addressed for me 👍
In the long run we might want to look at deserializing native span directly to protobuf without going through prost and the IR at some point, but this is obviously way out of the scope of this PR and OTLP in general.
What does this PR do?
Adds OTLP HTTP/protobuf as a trace-export encoding alongside HTTP/JSON, selectable via the OTel-standard
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL(http/jsondefault,http/protobuf). The generatedprostOTLP types are the single intermediate representation: the mapper builds them directly from native spans, protobuf isprost::encode_to_vec, and JSON is a serde serializer over the same types.Motivation
libdatadog's OTLP trace export only spoke HTTP/JSON. SDKs that honor
OTEL_EXPORTER_OTLP_TRACES_PROTOCOLneedhttp/protobufto match the OTel default and to talk to collectors that expect protobuf.Additional Notes
encode_otlp_protobufisencode_to_vec;encode_otlp_jsonis a serde serializer (json_serializer.rs) emitting OTLP-spec JSON (hex ids, base64 bytes, int64-as-string, lowerCamelCase, proto3 defaults omitted).OtlpProtocolis{HttpJson, HttpProtobuf};grpcis rejected at the parse boundary (FromStr) rather than carried as an unsupported variant.content_type()/encode()live on the type.Link.flags.How to test the change?
cargo test -p libdd-trace-utils -p libdd-data-pipeline -p libdd-data-pipeline-ffi,cargo test --doc, clippy, fmt, andcargo ffi-testpass. A parity test asserts the JSON and protobuf encodings carry the same span from the one prost IR; a protobuf round-trip test asserts the encoding is lossless.application/x-protobuf(HTTP 200), spans ingested with correct service/resource and a preserved 128-bit trace id.cargo bench -p libdd-trace-utils --bench main -- otlp/.BREAKING CHANGE: removes the previously public
libdd_trace_utils::otlp_encoder::json_typesmodule (the hand-rolled OTLP JSON model). OTLP encoding now builds prost-generated types as the single IR. libdatadog consumers pin by version, so they pick this up on the next release.