From a9acd8d1d54c0d5a803f26aa56b0a0a274a2f907 Mon Sep 17 00:00:00 2001 From: Douglas Q Hawkins Date: Fri, 29 May 2026 09:06:34 -0400 Subject: [PATCH 1/3] Tighten tracer.metrics defaults to protect tight-heap JVMs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cut the implicit TRACER_METRICS_MAX_PENDING default from 2048 (logical) to 128 on normal heap and to 64 at Xmx < 128 MB, and the implicit TRACER_METRICS_MAX_AGGREGATES default from 2048 to 256 at tight heap. Why --- The metrics inbox is an MpscArrayQueue sized to maxPending * LEGACY_BATCH_SIZE (64). With one ~120 B SpanSnapshot per slot, the prior 131072-slot default pinned ~15 MB worst-case in-flight when the aggregator stalled. At Xmx <= ~128 MB the G1 survivor region is too small to absorb that footprint -- observed catastrophically at Xmx 64 MB on spring-petclinic where the inbox overflowed young gen and triggered To-space Exhausted → Full GC storms (0 r/s in the worst case). New defaults bound the worst-case in-flight footprint at ~1 MB on normal heap and ~500 KB at tight heap, comfortably below typical survivor sizes and large enough to absorb the sub-second consumer stalls we actually see in practice (~0.8 s of buffer at 10 K spans/s on the normal-heap default). Customers who explicitly configure TRACER_METRICS_MAX_PENDING are unaffected; the LEGACY_BATCH_SIZE multiplier still applies to overrides. Only the implicit defaults shrink. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../main/java/datadog/trace/api/Config.java | 31 ++++++++++++++++--- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/internal-api/src/main/java/datadog/trace/api/Config.java b/internal-api/src/main/java/datadog/trace/api/Config.java index cfb7411325d..cec8ccc8403 100644 --- a/internal-api/src/main/java/datadog/trace/api/Config.java +++ b/internal-api/src/main/java/datadog/trace/api/Config.java @@ -2191,14 +2191,34 @@ private Config(final ConfigProvider configProvider, final InstrumenterConfig ins configProvider.getBoolean(TRACE_STATS_COMPUTATION_IGNORE_AGENT_VERSION, false); tracerMetricsBufferingEnabled = configProvider.getBoolean(TRACER_METRICS_BUFFERING_ENABLED, false); - tracerMetricsMaxAggregates = configProvider.getInteger(TRACER_METRICS_MAX_AGGREGATES, 2048); + // The metrics inbox is an MpscArrayQueue; each saturated slot holds one + // ~120 B SpanSnapshot. The historical default TRACER_METRICS_MAX_PENDING=2048 (logical) * + // LEGACY_BATCH_SIZE=64 = 131072 slots was sized for the prior conflating-Batch model where + // slot memory was only realized under burst; with one snapshot per slot, the worst-case + // in-flight footprint is ~15 MB. At Xmx <= ~128 MB the G1 survivor region is too small to + // absorb that footprint when the aggregator stalls -- observed catastrophically at Xmx64m + // petclinic where SpanSnapshots overflow young gen and trigger To-space Exhausted -> Full + // GC storms (0 r/s in the worst case). + // + // Cut the default accordingly: + // - normal heap: 128 logical * 64 = 8192 slots, ~1 MB worst-case in-flight. ~0.8 s of + // buffer at 10K spans/s, well above typical GC pause windows. + // - tight heap (Xmx < 128 MB): 64 logical * 64 = 4096 slots, ~500 KB worst case. + // + // Customers who explicitly configured TRACER_METRICS_MAX_PENDING keep their value (the + // LEGACY_BATCH_SIZE multiplier still applies to it) -- only the implicit default shrinks. + final boolean tightHeap = Runtime.getRuntime().maxMemory() < 128L * 1024 * 1024; + final int defaultMaxAggregates = tightHeap ? 256 : 2048; + final int defaultMaxPending = tightHeap ? 64 : 128; + + tracerMetricsMaxAggregates = + configProvider.getInteger(TRACER_METRICS_MAX_AGGREGATES, defaultMaxAggregates); /* * TRACER_METRICS_MAX_PENDING historically counted conflating Batch slots (~64 spans per batch * via Batch.MAX_BATCH_SIZE). The inbox now holds 1 SpanSnapshot per metrics-eligible span, so * we multiply the configured value by the legacy batch size to preserve the effective - * span-throughput capacity of the prior default *and* of any existing customer override - * (e.g. a configured 4096 still means "~262144 spans before drops", same as before). ~100 B - * per SpanSnapshot * 131072 ≈ 13 MB worst-case heap floor at the default. + * span-throughput capacity for any existing customer override (e.g. a configured 4096 still + * means "~262144 spans before drops", same as before). * * Long-promote the multiplication and clamp to MAX_SAFE_ARRAY_SIZE so an absurd customer * override (>= ~33M) can't silently wrap to a negative int. MAX_SAFE_ARRAY_SIZE sits a few @@ -2206,7 +2226,8 @@ private Config(final ConfigProvider configProvider, final InstrumenterConfig ins * see java.util.ArraysSupport.SOFT_MAX_ARRAY_LENGTH for the same convention. */ long requestedMaxPending = - (long) configProvider.getInteger(TRACER_METRICS_MAX_PENDING, 2048) * LEGACY_BATCH_SIZE; + (long) configProvider.getInteger(TRACER_METRICS_MAX_PENDING, defaultMaxPending) + * LEGACY_BATCH_SIZE; tracerMetricsMaxPending = (int) Math.min(requestedMaxPending, MAX_SAFE_ARRAY_SIZE); reportHostName = From 9ab5e58e4e93a7dde38cf8655579830a679621a4 Mon Sep 17 00:00:00 2001 From: Douglas Q Hawkins Date: Fri, 29 May 2026 13:08:41 -0400 Subject: [PATCH 2/3] Sync tracer.metrics.max.pending metadata default to 128 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The default cut from 2048 → 128 needs the matching entry in metadata/supported-configurations.json so config-inversion tooling and supported-configuration docs reflect the new value. DD_TRACE_TRACER_METRICS_MAX_AGGREGATES is left at 2048: the normal-heap default is unchanged. The metadata schema doesn't support heap-dependent defaults, so the tight-heap branch (64 / 256) isn't representable; the metadata reflects the normal-heap default that applies to the typical customer (Xmx >= 128 MB). Co-Authored-By: Claude Opus 4.7 (1M context) --- metadata/supported-configurations.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/metadata/supported-configurations.json b/metadata/supported-configurations.json index 23e1def3cef..2b428db134d 100644 --- a/metadata/supported-configurations.json +++ b/metadata/supported-configurations.json @@ -10925,7 +10925,7 @@ { "version": "A", "type": "int", - "default": "2048", + "default": "128", "aliases": [] } ], From 2b3c494d4436bc3c355b118b0afbf28dc2901219 Mon Sep 17 00:00:00 2001 From: Douglas Q Hawkins Date: Fri, 29 May 2026 13:56:19 -0400 Subject: [PATCH 3/3] chore(metadata): bump DD_TRACE_TRACER_METRICS_MAX_PENDING version to B Default changed from 2048 to 128; version field must be incremented when the default value changes. Co-Authored-By: Claude Sonnet 4.6 --- metadata/supported-configurations.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/metadata/supported-configurations.json b/metadata/supported-configurations.json index 2b428db134d..2c0c6ed6ed1 100644 --- a/metadata/supported-configurations.json +++ b/metadata/supported-configurations.json @@ -10923,7 +10923,7 @@ ], "DD_TRACE_TRACER_METRICS_MAX_PENDING": [ { - "version": "A", + "version": "B", "type": "int", "default": "128", "aliases": []