Skip to content

fix(tools): collapse intra-file duplicate typedefs in base header#2149

Open
lloeki wants to merge 1 commit into
mainfrom
lloeki/fix-dedup-headers-intra-file-typedefs
Open

fix(tools): collapse intra-file duplicate typedefs in base header#2149
lloeki wants to merge 1 commit into
mainfrom
lloeki/fix-dedup-headers-intra-file-typedefs

Conversation

@lloeki

@lloeki lloeki commented Jun 22, 2026

Copy link
Copy Markdown
Member

What

Enhance the dedup_headers dev tool so the generated/bundled
include/datadog/common.h no longer contains duplicate typedefs.

Why

dedup_headers only removed definitions from child headers that were
byte-identical to ones already present in the base header (common.h).
It never deduplicated definitions within the base header. When cbindgen
emits the same profiling type from two crate boundaries (e.g. via
after_includes forward declarations in libdd-profiling-ffi/cbindgen.toml
plus the regular body definition), the merged common.h ends up with
duplicate typedefs.

These are fatal for consumers compiling with -Werror -Wtypedef-redefinition
(C11). Two distinct classes were observed in the v36.0.0 artifacts:

  1. Forward + full-struct collision — a typedef struct X X; forward
    declaration coexisting with the full typedef struct X { ... } X;:
    ddog_prof_EncodedProfile, ddog_prof_StringId, OpaqueStringId.
  2. Exact-duplicate pointer typedefs emitted twice, identical except that
    one carries a doc comment (so the existing exact-string dedup keeps both):
    ddog_prof_StringId2, ddog_prof_MappingId2, ddog_prof_FunctionId2.

How

Add a final pass (dedup_base_typedefs) over the assembled base header that:

  • drops a bare forward typedef struct/union/enum X X; when a full-body
    definition of the same name X exists elsewhere in the file (keeping the
    body, regardless of ordering), and
  • drops later duplicates of an identical typedef statement, comparing the
    statement text with any leading doc comment stripped.

Opaque forward declarations (no body) and genuine aliases
(typedef struct A B; with A != B) are preserved.

This makes common.h clean by construction and obsoletes downstream
post-processing workarounds (e.g. the one in libdatadog-rb).

Validation

Headers were generated via the FFI crates' cbindgen build scripts and run
through dedup_headers exactly as builder invokes it.

Before — each of the six types appears twice; clang fails:

$ clang -std=gnu99 -Werror -Wtypedef-redefinition -I<out>/include -fsyntax-only t.c
common.h:541: error: redefinition of typedef 'ddog_prof_EncodedProfile' ...
common.h:909: error: redefinition of typedef 'ddog_prof_StringId2' ...
common.h:936: error: redefinition of typedef 'ddog_prof_MappingId2' ...
common.h:965: error: redefinition of typedef 'ddog_prof_FunctionId2' ...
common.h:1166: error: redefinition of typedef 'OpaqueStringId' ...
common.h:1516: error: redefinition of typedef 'ddog_prof_StringId' ...
6 errors generated.

After — each type appears exactly once; clang passes (exit 0):

ddog_prof_EncodedProfile : 1
ddog_prof_StringId       : 1
OpaqueStringId           : 1
ddog_prof_StringId2      : 1
ddog_prof_MappingId2     : 1
ddog_prof_FunctionId2    : 1
  • cargo test -p tools --lib — 20 passed (5 new tests for the dedup pass)
  • cargo clippy -p tools --all-targets --all-features -- -D warnings — clean
  • cargo fmt -p tools -- --check — clean

Note

One of three coordinated changes for the libdatadog v36 duplicate-typedef header issue (increasing order of permanence):

  • dd-trace-rb#5928 — immediate CI mitigation: bump dd-trace-rb to v36 plus a temporary -Wno-error=typedef-redefinition stopgap.
  • libdatadog-rb#62 — gem-level fix: strip the duplicate typedefs during vendoring and ship 36.0.0.1.1, without waiting for a libdatadog release.
  • This PR (libdatadog#2149) — upstream fix in dedup_headers: makes common.h clean by construction and obsoletes the libdatadog-rb post-processing once released.

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

📚 Documentation Check Results

⚠️ 173 documentation warning(s) found

📦 tools - 173 warning(s)


Updated: 2026-06-22 16:07:28 UTC | Commit: 7e0d4bf | missing-docs job results

@github-actions

Copy link
Copy Markdown
Contributor

Clippy Allow Annotation Report

Comparing clippy allow annotations between branches:

  • Base Branch: origin/main
  • PR Branch: origin/lloeki/fix-dedup-headers-intra-file-typedefs

Summary by Rule

Rule Base Branch PR Branch Change

Annotation Counts by File

File Base Branch PR Branch Change

Annotation Stats by Crate

Crate Base Branch PR Branch Change
clippy-annotation-reporter 5 5 No change (0%)
datadog-ffe-ffi 1 1 No change (0%)
datadog-ipc 22 22 No change (0%)
datadog-live-debugger 4 4 No change (0%)
datadog-live-debugger-ffi 10 10 No change (0%)
datadog-profiling-replayer 4 4 No change (0%)
datadog-sidecar 45 45 No change (0%)
libdd-common 13 13 No change (0%)
libdd-common-ffi 12 12 No change (0%)
libdd-data-pipeline 6 6 No change (0%)
libdd-ddsketch 2 2 No change (0%)
libdd-dogstatsd-client 1 1 No change (0%)
libdd-profiling 13 13 No change (0%)
libdd-remote-config 3 3 No change (0%)
libdd-telemetry 20 20 No change (0%)
libdd-tinybytes 4 4 No change (0%)
libdd-trace-normalization 2 2 No change (0%)
libdd-trace-obfuscation 3 3 No change (0%)
libdd-trace-stats 1 1 No change (0%)
libdd-trace-utils 11 11 No change (0%)
Total 182 182 No change (0%)

About This Report

This report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality.

@datadog-official

datadog-official Bot commented Jun 22, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

DataDog/apm-reliability/libdatadog | benchmarks   View in Datadog   GitLab

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

🎯 Code Coverage (details)
Patch Coverage: 93.80%
Overall Coverage: 73.98% (+0.06%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 77f60a7 | Docs | Datadog PR Page | Give us feedback!

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

🔒 Cargo Deny Results

⚠️ 1 issue(s) found, showing only errors (advisories, bans, sources)

📦 tools - 1 error(s)

Show output
error[unsound]: Rand is unsound with a custom logger using `rand::rng()`
   ┌─ /home/runner/work/libdatadog/libdatadog/Cargo.lock:76:1
   │
76 │ rand 0.8.5 registry+https://github.com/rust-lang/crates.io-index
   │ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unsound advisory detected
   │
   ├ ID: RUSTSEC-2026-0097
   ├ Advisory: https://rustsec.org/advisories/RUSTSEC-2026-0097
   ├ It has been reported (by @lopopolo) that the `rand` library is [unsound](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#soundness-of-code--of-a-library) (i.e. that safe code using the public API can cause Undefined Behaviour) when all the following conditions are met:
     
     - The `log` and `thread_rng` features are enabled
     - A [custom logger](https://docs.rs/log/latest/log/#implementing-a-logger) is defined
     - The custom logger accesses `rand::rng()` (previously `rand::thread_rng()`) and calls any `TryRng` (previously `RngCore`) methods on `ThreadRng`
     - The `ThreadRng` (attempts to) reseed while called from the custom logger (this happens every 64 kB of generated data)
     - Trace-level logging is enabled or warn-level logging is enabled and the random source (the `getrandom` crate) is unable to provide a new seed
     
     `TryRng` (previously `RngCore`) methods for `ThreadRng` use `unsafe` code to cast `*mut BlockRng<ReseedingCore>` to `&mut BlockRng<ReseedingCore>`. When all the above conditions are met this results in an aliased mutable reference, violating the Stacked Borrows rules. Miri is able to detect this violation in sample code. Since construction of [aliased mutable references is Undefined Behaviour](https://doc.rust-lang.org/stable/nomicon/references.html), the behaviour of optimized builds is hard to predict.
   ├ Announcement: https://github.com/rust-random/rand/pull/1763
   ├ Solution: Upgrade to >=0.10.1 OR <0.10.0, >=0.9.3 OR <0.9.0, >=0.8.6 (try `cargo update -p rand`)
   ├ rand v0.8.5
     └── (dev) libdd-common v5.0.0
         └── tools v36.0.0

advisories FAILED, bans ok, sources ok

Updated: 2026-06-22 16:09:40 UTC | Commit: 7e0d4bf | dependency-check job results

dedup_headers only removed definitions from child headers that were
byte-identical to ones already present in the base header. It never
deduplicated definitions within the base header itself, so cbindgen
output that emits the same profiling type from two crate boundaries
left duplicate typedefs in the merged common.h.

Two cases survived and broke consumers compiling with
-Werror -Wtypedef-redefinition (C11):

1. A bare forward declaration "typedef struct X X;" coexisting with the
   full-body definition "typedef struct X { ... } X;" (e.g.
   ddog_prof_EncodedProfile, ddog_prof_StringId, OpaqueStringId).
2. An identical pointer typedef emitted twice whose doc comments differ,
   so the existing exact-string dedup kept both (ddog_prof_StringId2,
   ddog_prof_MappingId2, ddog_prof_FunctionId2).

Add a final pass over the assembled base header that drops a forward
struct/union/enum declaration when a full-body definition of the same
name exists, and removes later duplicates of an identical typedef
statement regardless of differing comments. This makes the generated
common.h clean by construction and removes the need for downstream
post-processing workarounds.
@lloeki lloeki force-pushed the lloeki/fix-dedup-headers-intra-file-typedefs branch from ab502c6 to 77f60a7 Compare June 22, 2026 16:05
Comment thread tools/src/lib.rs
Comment on lines +123 to +125
match s.find("*/") {
Some(end) => s[end + 2..].trim_start(),
None => s,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match s.find("*/") {
Some(end) => s[end + 2..].trim_start(),
None => s,
s.find("*/").map(|end| s[end + 2..].trim_start()).unwrap_or(s);

Comment thread tools/src/lib.rs
}

fn is_ident(s: &str) -> bool {
!s.is_empty() && s.bytes().all(|b| b.is_ascii_alphanumeric() || b == b'_')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a token to be a valid C identifier, the first character must NOT be a digit, which isn't included in this test. I don't remember in C but in Rust for example this could match 0usize which isn't an identifier.

Comment thread tools/src/lib.rs
.filter_map(|d| bodied_typedef_name(statement_text(d.str)))
.collect();

let mut seen: HashSet<&str> = HashSet::new();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely clear if what seen includes going forward. I would suggest to give it a more explicit name (I suppose it's the typedefs that are not bodied?)

@dd-octo-sts

dd-octo-sts Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Artifact Size Benchmark Report

aarch64-alpine-linux-musl
Artifact Baseline Commit Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so 7.76 MB 7.76 MB 0% (0 B) 👌
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a 84.52 MB 84.52 MB 0% (0 B) 👌
aarch64-unknown-linux-gnu
Artifact Baseline Commit Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.43 MB 10.43 MB 0% (0 B) 👌
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a 95.66 MB 95.66 MB 0% (0 B) 👌
libdatadog-x64-windows
Artifact Baseline Commit Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll 25.01 MB 25.01 MB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib 87.33 KB 87.33 KB 0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb 181.76 MB 181.75 MB -0% (-16.00 KB) 👌
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib 932.35 MB 932.35 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll 8.20 MB 8.20 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib 87.33 KB 87.33 KB 0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb 24.12 MB 24.12 MB 0% (0 B) 👌
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib 48.20 MB 48.20 MB 0% (0 B) 👌
libdatadog-x86-windows
Artifact Baseline Commit Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll 21.68 MB 21.68 MB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib 88.71 KB 88.71 KB 0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb 185.74 MB 185.73 MB -0% (-16.00 KB) 👌
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib 920.84 MB 920.84 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll 6.32 MB 6.32 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib 88.71 KB 88.71 KB 0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb 25.88 MB 25.88 MB 0% (0 B) 👌
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib 45.82 MB 45.82 MB 0% (0 B) 👌
x86_64-alpine-linux-musl
Artifact Baseline Commit Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a 75.35 MB 75.35 MB 0% (0 B) 👌
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so 8.68 MB 8.68 MB 0% (0 B) 👌
x86_64-unknown-linux-gnu
Artifact Baseline Commit Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a 90.80 MB 90.80 MB 0% (0 B) 👌
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so 10.55 MB 10.55 MB 0% (0 B) 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants