Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 26 additions & 5 deletions internal-api/src/main/java/datadog/trace/api/Config.java
Original file line number Diff line number Diff line change
Expand Up @@ -2191,22 +2191,43 @@ private Config(final ConfigProvider configProvider, final InstrumenterConfig ins
configProvider.getBoolean(TRACE_STATS_COMPUTATION_IGNORE_AGENT_VERSION, false);
tracerMetricsBufferingEnabled =
configProvider.getBoolean(TRACER_METRICS_BUFFERING_ENABLED, false);
tracerMetricsMaxAggregates = configProvider.getInteger(TRACER_METRICS_MAX_AGGREGATES, 2048);
// The metrics inbox is an MpscArrayQueue<SpanSnapshot>; each saturated slot holds one
// ~120 B SpanSnapshot. The historical default TRACER_METRICS_MAX_PENDING=2048 (logical) *
// LEGACY_BATCH_SIZE=64 = 131072 slots was sized for the prior conflating-Batch model where
// slot memory was only realized under burst; with one snapshot per slot, the worst-case
// in-flight footprint is ~15 MB. At Xmx <= ~128 MB the G1 survivor region is too small to
// absorb that footprint when the aggregator stalls -- observed catastrophically at Xmx64m
// petclinic where SpanSnapshots overflow young gen and trigger To-space Exhausted -> Full
// GC storms (0 r/s in the worst case).
//
// Cut the default accordingly:
// - normal heap: 128 logical * 64 = 8192 slots, ~1 MB worst-case in-flight. ~0.8 s of
// buffer at 10K spans/s, well above typical GC pause windows.
// - tight heap (Xmx < 128 MB): 64 logical * 64 = 4096 slots, ~500 KB worst case.
//
// Customers who explicitly configured TRACER_METRICS_MAX_PENDING keep their value (the
// LEGACY_BATCH_SIZE multiplier still applies to it) -- only the implicit default shrinks.
final boolean tightHeap = Runtime.getRuntime().maxMemory() < 128L * 1024 * 1024;
final int defaultMaxAggregates = tightHeap ? 256 : 2048;
final int defaultMaxPending = tightHeap ? 64 : 128;
Comment on lines +2211 to +2212
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sync metadata with the new tracer-metrics defaults

Changing the implicit default here leaves metadata/supported-configurations.json advertising DD_TRACE_TRACER_METRICS_MAX_PENDING and DD_TRACE_TRACER_METRICS_MAX_AGGREGATES as 2048. That file is the source used for supported-configuration metadata/docs, so users and config-inversion tooling will still see the old defaults even though normal heaps now get 128 pending and tight heaps get 64/256. Please update the metadata entry (or otherwise represent the heap-dependent default) along with this runtime change.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

@dougqh dougqh May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Claude...

Thanks — pushed 9ab5e58e4e updating DD_TRACE_TRACER_METRICS_MAX_PENDING from 2048128 to match the new normal-heap default.

DD_TRACE_TRACER_METRICS_MAX_AGGREGATES is left at 2048: that's still the normal-heap default (only the tight-heap branch changes it to 256). The current schema is {version, type, default, aliases} with a single string default — there's no way to encode "heap-dependent default" without a schema extension (e.g. a defaultExpression field or defaults: [{when, value}] shape). So both keys' tight-heap branches stay unrepresented in metadata; the typical customer (Xmx ≥ 128 MB) sees the documented value.

If a schema extension is in scope, happy to take that as a follow-up.


tracerMetricsMaxAggregates =
configProvider.getInteger(TRACER_METRICS_MAX_AGGREGATES, defaultMaxAggregates);
/*
* TRACER_METRICS_MAX_PENDING historically counted conflating Batch slots (~64 spans per batch
* via Batch.MAX_BATCH_SIZE). The inbox now holds 1 SpanSnapshot per metrics-eligible span, so
* we multiply the configured value by the legacy batch size to preserve the effective
* span-throughput capacity of the prior default *and* of any existing customer override
* (e.g. a configured 4096 still means "~262144 spans before drops", same as before). ~100 B
* per SpanSnapshot * 131072 ≈ 13 MB worst-case heap floor at the default.
* span-throughput capacity for any existing customer override (e.g. a configured 4096 still
* means "~262144 spans before drops", same as before).
*
* Long-promote the multiplication and clamp to MAX_SAFE_ARRAY_SIZE so an absurd customer
* override (>= ~33M) can't silently wrap to a negative int. MAX_SAFE_ARRAY_SIZE sits a few
* bytes below Integer.MAX_VALUE because the JVM reserves header slack on array allocations;
* see java.util.ArraysSupport.SOFT_MAX_ARRAY_LENGTH for the same convention.
*/
long requestedMaxPending =
(long) configProvider.getInteger(TRACER_METRICS_MAX_PENDING, 2048) * LEGACY_BATCH_SIZE;
(long) configProvider.getInteger(TRACER_METRICS_MAX_PENDING, defaultMaxPending)
* LEGACY_BATCH_SIZE;
tracerMetricsMaxPending = (int) Math.min(requestedMaxPending, MAX_SAFE_ARRAY_SIZE);

reportHostName =
Expand Down
4 changes: 2 additions & 2 deletions metadata/supported-configurations.json
Original file line number Diff line number Diff line change
Expand Up @@ -10915,9 +10915,9 @@
],
"DD_TRACE_TRACER_METRICS_MAX_PENDING": [
{
"version": "A",
"version": "B",
"type": "int",
"default": "2048",
"default": "128",
Comment thread
AlexeyKuznetsov-DD marked this conversation as resolved.
"aliases": []
}
],
Expand Down