Skip to content

feat: add async retry and bearer-auth pipeline steps#157

Open
OmarAlJarrah wants to merge 1 commit into
mainfrom
feat/async-pipeline-steps
Open

feat: add async retry and bearer-auth pipeline steps#157
OmarAlJarrah wants to merge 1 commit into
mainfrom
feat/async-pipeline-steps

Conversation

@OmarAlJarrah

Copy link
Copy Markdown
Member

Summary

Adds the async (AsyncHttpStep) counterparts for the RETRY and AUTH pillar
stages, so calls run through the async pipeline get the same retry and authentication
behaviour the synchronous pipeline already has.

Async retry (DefaultAsyncRetryStep, AsyncRetryStep)

Mirrors DefaultRetryStep's policy exactly — it reuses HttpRetryOptions, the shared
BackoffCalculator, RetryAfterParser, the same Retry-After header set, and the same
idempotency-aware re-sendability gating (idempotent method or replayable body; a bare
non-idempotent POST or a non-replayable body is not retried).

  • Non-blocking delays: backoff is scheduled on a ScheduledExecutorService via
    Futures.delay — no Thread.sleep, no Timer. While a delay is pending the
    dispatching thread is free.
  • Stack-safe loop: the retry loop is driven by an iterative trampoline rather than a
    per-attempt thenCompose chain, so a long (even zero-delay) retry sequence does not
    grow the call stack or the future continuation graph. A 5,000-attempt test proves this.
  • Failure handling: CompletionException wrappers are unwrapped before classification;
    Error is never retried; prior attempts are attached as suppressed on the terminal
    exception.

Async bearer auth (AsyncBearerTokenAuthStep, AsyncAuthStep)

Adds a non-blocking BearerTokenProvider.fetchAsync seam (default forwards to the blocking
fetch; adapter modules override to dispatch off-thread). The step:

  • Background refresh: a token that is still valid but inside the refresh margin is
    returned and stamped immediately while the refresh runs off-thread — the in-flight
    request never waits on the token endpoint.
  • Single-flight: concurrent requests that all see an expiring or missing token share
    one in-flight provider call, so they don't stampede the token endpoint.
  • Keeps the HTTPS-only credential guard, cross-origin credential suppression, and
    401 + WWW-Authenticate: Bearer token eviction + single-retry behaviour from the
    synchronous BearerTokenAuthStep.

Per-cloud token providers (GCP/Azure/Kubernetes) and OAuth token-exchange specifics are
deliberately kept out of sdk-core — they belong in adapter modules and override
fetchAsync.

Testing

A new ManualScheduler test fixture drives the scheduled delays deterministically, so the
async retry tests assert the full backoff/Retry-After schedule with no real sleeps. The
async auth tests use deferred CompletableFutures to prove the valid-but-expiring token is
stamped without blocking and that concurrent fetches coalesce.

Closes #31
Closes #32

Gated build (module-scoped, run locally)

./gradlew :sdk-core:apiDump --no-daemon                # regenerated api/sdk-core.api committed
./gradlew :sdk-core:test --no-daemon                   # full suite green
./gradlew :sdk-core:ktlintCheck :sdk-core:detekt :sdk-core:apiCheck --no-daemon   # green

All passed. apiDump was run for the intentional public-API additions and the regenerated
api/sdk-core.api is committed.

Add the AsyncHttpStep counterparts for the RETRY and AUTH pillar stages so
async calls get the same retry and authentication behaviour as the synchronous
pipeline.

DefaultAsyncRetryStep mirrors DefaultRetryStep's policy exactly — it reuses
HttpRetryOptions, the shared BackoffCalculator, RetryAfterParser, the same
Retry-After header set, and the same idempotency-aware re-sendability gating.
Backoff delays are scheduled on a ScheduledExecutorService via Futures.delay
rather than blocking a thread, and the retry loop is driven by an iterative
trampoline (no per-attempt thenCompose recursion), so a long zero-delay retry
sequence stays stack-safe.

AsyncBearerTokenAuthStep stamps Authorization: Bearer via a new non-blocking
BearerTokenProvider.fetchAsync seam. A token that is still valid but inside the
refresh margin is returned and stamped immediately while a refresh runs
off-thread; concurrent requests that observe an expiring or missing token share
a single in-flight fetch (single-flight) so they don't stampede the token
endpoint. The HTTPS guard, cross-origin credential suppression, and
401-challenge token eviction match the synchronous BearerTokenAuthStep.

A ManualScheduler test fixture drives the scheduled delays deterministically so
the retry tests run without real sleeps.
@OmarAlJarrah

Copy link
Copy Markdown
Member Author

This adds the async (CompletableFuture-based) counterparts of the AUTH and RETRY pillar steps, plus BearerTokenProvider.fetchAsync defaults and a ManualScheduler test fixture. The API additions are correctly reflected in sdk-core.api and the core compiles clean. The mechanics (single-flight refresh, trampolined non-blocking backoff, the 5000-attempt stack-safety check) look solid, but the async retry step drops two guarantees the sync step makes and that its own KDoc claims to preserve. This needs rework before merging.

Issues

Blocking

  • A throwing retry predicate hangs the caller's future and leaks the open responseDefaultAsyncRetryStep.kt:772-810. The downstream attempt is wired up as attempt.whenComplete { response, error -> handleOutcome(...) } with no surrounding try/catch, and the per-call result future is only completed inside the success/failure branches. onSuccess calls shouldRetryResponse(response) before closeQuietly(response), and invokeShouldRetry rethrows a misbehaving predicate as IllegalStateException. That exception unwinds out of onSuccess into the whenComplete callback, whose return value is discarded, so result is never completed (the caller blocks forever) and the still-open retryable response leaks its socket/buffer. The same hang applies to onFailure via shouldRetryException. The sync DefaultRetryStep.decideRetryResponse deliberately wraps these calls in a try/catch that closes the response and rethrows so the caller sees an IOException; that guard is missing here.

  • computeResponseDelay/computeExceptionDelay throwing hangs and leaks the same wayDefaultAsyncRetryStep.kt:200-205. Same defect, second throw site. In onSuccess, computeResponseDelay(response, tryCount) runs before closeQuietly, so an override of the delay computation (the class is open and the sync sibling documents these as extension points) or any unexpected throw from BackoffCalculator.computeDelay escapes through the discarded whenComplete callback and leaves result uncompleted with the response open. Whatever try/catch fixes the predicate path must also cover the delay-computation calls, completing result exceptionally and closing any open response.

  • InterruptedIOException is treated as retryable, diverging from the cancellation policyDefaultAsyncRetryStep.kt:812-837. onFailure special-cases only is Error, then classifies everything else through shouldRetryException, which (via RetryUtils.isRetryable) returns true for any IOException in the cause chain. InterruptedIOException extends IOException, so an interrupt-signalling failure from a blocking-transport-bridged chain gets retried. The sync DefaultRetryStep intercepts InterruptedIOException/InterruptedException before classification, restores the interrupt flag, and surfaces a terminal InterruptedIOException with prior failures attached, per the SDK-wide cancellation convention. The async step's KDoc asserts "the exact same policy as the synchronous stack," but the interrupt carve-out is absent. Please mirror the sync interception here.

Nit

  • No test covers the throwing-predicate or interrupt pathsDefaultAsyncRetryStepTest.kt. The suite mirrors the sync cases (status/exception retry, idempotency gating, Retry-After, body close, stack safety) but has no case where shouldRetryCondition/shouldRetryException throws, nor where the downstream completes with an InterruptedIOException. Those are exactly the behaviors that differ from the sync step, so the gap is what let the issues above through. Worth adding alongside the fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an async bearer-auth step with background token refresh Add an async retry step at the RETRY stage

1 participant