Skip to content

refactor: share an instrumentation emitter and harden body capture#144

Open
OmarAlJarrah wants to merge 1 commit into
mainfrom
refactor/instrumentation-emitter
Open

refactor: share an instrumentation emitter and harden body capture#144
OmarAlJarrah wants to merge 1 commit into
mainfrom
refactor/instrumentation-emitter

Conversation

@OmarAlJarrah

Copy link
Copy Markdown
Member

Problem

The synchronous (DefaultInstrumentationStep) and asynchronous
(DefaultAsyncInstrumentationStep) instrumentation steps duplicated ~200 lines of
emit / redact / preview / metrics logic, with a standing TODO to deduplicate. The
duplication also let two body-capture problems drift between the two steps:

  1. Time-to-first-byte stall in the sync step. The async step deliberately skips
    body capture for an unknown-length (contentLength() < 0) response body, because
    the bounded drain runs on the future-completion thread and would block on a slow /
    idle producer. The sync step had no such guard: under BODY_AND_HEADERS it drained
    eagerly on the caller's thread, so a streaming endpoint (SSE, long-poll, chunked
    trickle) did not return the response until the preview cap filled or the producer
    reached EOF — turning time-to-first-byte into "however long the first
    bodyPreviewMaxBytes take to arrive."

  2. Preview-capped body size only. The request.body.size / response.body.size
    fields are derived from the capped preview snapshot, so they report
    min(actualSize, bodyPreviewMaxBytes) and saturate at the cap. A dashboard keyed on
    *.body.size cannot tell an 8 KiB body from an 8 GB one.

Change

  • Extract InstrumentationEmitters, an internal class that owns the shared event shape,
    header redaction, body wrapping, metric instruments, and URL redaction. Both steps
    delegate to it and now differ only in how they thread the downstream call (synchronous
    process vs. a CompletableFuture continuation).
  • wrapResponseForLogging applies the contentLength() < 0 skip for both steps, so
    an unknown-length body streams to the caller unwrapped with no preview. The
    http.response event still carries headers, status, and response.content.length = -1.
  • Emit two additional fields per captured body: *.body.actual_size (the true length,
    emitted when known) and *.body.preview_truncated (true when the preview is only a
    prefix of a larger body). Response-body truncation is derived from the existing capture
    via a new internal LoggableResponseBody.isFullyCaptured seam, without re-reading the
    body.
  • Existing metric names (http.client.request.count, http.client.request.duration),
    event names (http.request, http.response), and field names are unchanged.
  • Update HttpInstrumentationOptions KDoc and
    docs/http-body-logging-and-concurrency.md to document the new fields and the now-symmetric
    streaming-body skip.

Rationale

Consolidating the two steps removes the source of drift and gives a single home for the
charset-aware preview and size logic. Skipping the drain for unknown-length bodies keeps
time-to-first-byte gated by network transfer rather than producer pacing, matching the
async step's existing behaviour. The additive size fields make the true body size
observable without breaking dashboards that read the existing *.body.size field.

API impact

No public-API change. The new isFullyCaptured accessor is internal / @JvmSynthetic;
apiCheck passes with no regenerated snapshots.

Gated build commands run (all passed)

  • ./gradlew :sdk-core:test :sdk-core:ktlintCheck :sdk-core:detekt :sdk-core:apiCheck --no-daemon — BUILD SUCCESSFUL
  • ./gradlew :sdk-core:test :sdk-core:ktlintCheck :sdk-core:detekt --no-daemon (after adding the async size test) — BUILD SUCCESSFUL

Closes #26
Closes #107
Closes #108

The sync and async instrumentation steps duplicated ~200 lines of
emit/redact/preview/metrics logic. Extract a shared InstrumentationEmitters
that owns the event shape, header redaction, body wrapping, and metric
instruments; both steps now delegate to it and differ only in how they thread
the downstream call (synchronous process vs. a CompletableFuture continuation).

While consolidating, fix two body-capture problems the shared path now handles
once for both steps:

- Stop the sync step from stalling time-to-first-byte. The async step already
  skipped body capture for unknown-length (contentLength() < 0) response bodies
  because the bounded drain runs on the completion thread and would block on a
  slow producer; the sync step had no such guard and drained eagerly on the
  caller's thread, so an SSE / long-poll / chunked-trickle endpoint logged at
  BODY_AND_HEADERS did not return the response until the preview cap filled or
  the producer hit EOF. The shared wrapResponseForLogging applies the
  contentLength() < 0 skip to both steps, so unknown-length bodies stream to the
  caller unwrapped with no preview.

- Make the true body size observable. The *.body.size fields are derived from
  the capped preview snapshot, so they saturate at bodyPreviewMaxBytes and a
  dashboard cannot tell an 8 KiB body from an 8 GB one. The events now also
  carry *.body.actual_size (the full length, emitted when known) and
  *.body.preview_truncated (true when the preview is only a prefix). Existing
  field names and metric/event names are unchanged.

Docs and the HttpInstrumentationOptions KDoc are updated to describe the new
fields and the symmetric streaming-body skip.

Closes #26
Closes #107
Closes #108
@OmarAlJarrah OmarAlJarrah changed the title Share an instrumentation emitter and harden body capture refactor: share an instrumentation emitter and harden body capture Jun 17, 2026
@OmarAlJarrah

Copy link
Copy Markdown
Member Author

This cleanly factors the duplicated emit/redact/preview/metrics logic out of the two instrumentation steps into InstrumentationEmitters, adds the new *.body.actual_size / *.body.preview_truncated fields, and aligns the sync step with the async one by skipping capture of unknown-length response bodies. The shared seam and the isFullyCaptured derivation are a solid improvement over the old copy-paste. Approving — just two low-severity things worth tidying before or shortly after merge.

Issues

  • Request and response preview_truncated disagree at the exact cap boundarysdk-core/src/main/kotlin/org/dexpace/sdk/core/http/pipeline/steps/InstrumentationEmitters.kt:251-284. The request path uses actualSize > cap (so a body of exactly cap bytes reports false), while the response path uses !isFullyCaptured. In LoggableResponseBody.drainAndCache, when the body length equals maxCaptureBytes the read loop consumes all bytes and exits at remaining == 0 without ever observing EOF, so fullyCaptured stays false and a liveTail is retained. For a known-length response whose length equals the cap, that yields three inconsistencies: response.body.preview_truncated=true even though the whole body was captured; an event where actual_size == size yet preview_truncated == true; and source() handing out the single-use PrefixThenTailSource for a body that actually fit entirely in the preview (a repeatability regression at the boundary). The underlying off-by-one is pre-existing in LoggableResponseBody's over-cap detection, but this PR newly surfaces it through the field and the divergence from the request-side formula. The two formulas should agree — easiest fix is to detect EOF at the boundary (e.g. attempt one more read when remaining hits 0) so fullyCaptured reflects an exactly-cap-sized body.

Worth double-checking

  • Should the sync path mark why a body preview is absent?sdk-core/src/main/kotlin/org/dexpace/sdk/core/http/pipeline/steps/InstrumentationEmitters.kt:119-126. Skipping capture for contentLength() < 0 on the sync path is a deliberate, tested change to avoid blocking the caller on a slow/idle producer, so the behavior itself looks right. But it does mean chunked/unknown-length responses (very common for JSON APIs) now log no response.body.size / preview / actual_size on a path that previously captured them, with nothing in the event distinguishing "intentionally skipped" from a logging misconfiguration. Would it be worth emitting an explicit marker (e.g. response.body.capture_skipped="unknown_length") so the absence is self-describing? Not blocking, but it's the most user-visible effect of this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant