Context
Phase 2 PR #3690 (refund correlation backfill retry) ships retryWithBackoff that retries on ALL error types. Independent /critical-review fallback flagged 1 medium + 1 low for a future polish pass.
Findings
Medium — Blind retry on non-transient Stripe errors
modules/billing/lib/billing.retry.js:17-29 — retryWithBackoff has no error filter; retries on invalid_request_error, idempotency_error just as eagerly as on 500s.
invalid_request_error (bad paymentIntentId, auth failure) : deterministic failure → wastes 600ms before dead-letter
idempotency_error (duplicate request key) : retry may produce different result but amplifies webhook handler latency
Fix: short-circuit on err.type === 'invalid_request_error' (StripeInvalidRequestError) — skip retry, go straight to dead-letter.
Low — err.stack dropped in new error log
modules/billing/services/billing.webhook.service.js:331 — the replaced logger.warn included stack: err?.stack ; the new logger.error (more severe!) omits it.
Inconsistent with other logger.error calls on evtErr in same file (lines 1021, 1054, 1214, 403, 438) which all include evtErr?.stack.
Fix: add stack: err?.stack to the meta object on both new logger.error calls (outer + inner DLQ).
Acceptance
- Retry loop short-circuits on non-transient Stripe error types
- err.stack included on both new logger.error calls (PI backfill exhaustion + DLQ write failure)
- Unit tests cover the short-circuit path (assert 1 call, not 3, on
invalid_request_error)
Source
Discovered by /critical-review Claude fallback on PR #3690 (CR rate-limited 2026-05-21).
Context
Phase 2 PR #3690 (refund correlation backfill retry) ships
retryWithBackoffthat retries on ALL error types. Independent /critical-review fallback flagged 1 medium + 1 low for a future polish pass.Findings
Medium — Blind retry on non-transient Stripe errors
modules/billing/lib/billing.retry.js:17-29—retryWithBackoffhas no error filter; retries oninvalid_request_error,idempotency_errorjust as eagerly as on 500s.invalid_request_error(badpaymentIntentId, auth failure) : deterministic failure → wastes 600ms before dead-letteridempotency_error(duplicate request key) : retry may produce different result but amplifies webhook handler latencyFix: short-circuit on
err.type === 'invalid_request_error'(StripeInvalidRequestError) — skip retry, go straight to dead-letter.Low —
err.stackdropped in new error logmodules/billing/services/billing.webhook.service.js:331— the replacedlogger.warnincludedstack: err?.stack; the newlogger.error(more severe!) omits it.Inconsistent with other
logger.errorcalls onevtErrin same file (lines 1021, 1054, 1214, 403, 438) which all includeevtErr?.stack.Fix: add
stack: err?.stackto the meta object on both new logger.error calls (outer + inner DLQ).Acceptance
invalid_request_error)Source
Discovered by
/critical-reviewClaude fallback on PR #3690 (CR rate-limited 2026-05-21).