fix(billing): retry refund correlation backfill 3x + dead-letter#3690
Conversation
PaymentIntent metadata backfill on checkout.session.completed now retries 3 times with exponential backoff (200ms, 400ms, 800ms) before escalating. On exhaustion, a persistent BillingFailedBackfill dead-letter record is written so operators can manually patch missing PI metadata and preserve refund correlation. Strategy chosen: Option B — new BillingFailedBackfill model (no prior mechanism for this specific failure path). The existing ProcessedStripeEvent dead-letter covers Stripe event delivery failures; this addresses a different failure class (Stripe API write failure on a side-effect within a successfully-delivered event handler). BillingFailedBackfill model auto-registers via assets.js glob: modules/*/models/*.mongoose.js — no billing.init.js change required. Closes audit P1 (2026-05-21) — silent warn-log path on refund correlation.
- IMPORTANT: JSDoc delay sequence corrected (200 → 400, not 200 → 400 → 800 for default attempts=3, baseMs=200 — only 2 delays fire) - IMPORTANT: BillingFailedBackfill access moved to billing.failedBackfill.repository.js (Option B) — lazy mongoose.model() removed from service layer; test mocks updated to stub the repository boundary instead of mongoose.model() - IMPORTANT: sparse index → partial index on resolvedAt (sparse no-op because field has default: null, all docs include it — partial targets only unresolved docs) - MINOR: RUNBOOK section 6 symptom includes [billing.webhook] prefix
|
Caution Review failedFailed to post review comments WalkthroughAdds an exponential backoff retry helper, a Mongoose dead-letter model and repository for PaymentIntent metadata backfill failures, integrates retries+dead-letter writes into the billing webhook, adds unit tests for retry/failed-write scenarios, and documents runbook triage steps. ChangesPaymentIntent Backfill Resilience
Sequence DiagramsequenceDiagram
participant CheckoutHandler as handleCheckoutPaymentCompleted
participant RetryHelper as retryWithBackoff
participant StripeAPI as stripe.paymentIntents.update
participant DeadLetterRepo as BillingFailedBackfillRepository
participant Logger as logger
CheckoutHandler->>RetryHelper: call retryWithBackoff(() => StripeAPI(update metadata))
RetryHelper->>StripeAPI: paymentIntents.update(...) attempt
alt succeeds within attempts
StripeAPI-->>RetryHelper: success
RetryHelper-->>CheckoutHandler: resolve
else all attempts fail
StripeAPI-->>RetryHelper: error
RetryHelper-->>CheckoutHandler: reject (last error)
CheckoutHandler->>Logger: error("backfill failed after retries", paymentIntentId)
CheckoutHandler->>DeadLetterRepo: record({ paymentIntentId, stripeSessionId, error })
alt dead-letter write fails
DeadLetterRepo-->>CheckoutHandler: error
CheckoutHandler->>Logger: error("failed to persist backfill record", paymentIntentId)
end
CheckoutHandler-->>CheckoutHandler: handler resolves (no throw)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3690 +/- ##
==========================================
+ Coverage 89.66% 89.70% +0.04%
==========================================
Files 140 142 +2
Lines 4759 4780 +21
Branches 1491 1498 +7
==========================================
+ Hits 4267 4288 +21
Misses 385 385
Partials 107 107
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds retry + dead-letter handling for the Stripe PaymentIntent metadata backfill performed during checkout.session.completed, to prevent silent failures that later break refund correlation and require manual recovery.
Changes:
- Wrap PaymentIntent metadata backfill in
retryWithBackoff(3 attempts, exponential backoff) and escalate on exhaustion. - Persist exhausted backfill failures to a new
BillingFailedBackfilldead-letter collection (repository + model) and document manual reconciliation steps in the billing runbook. - Add targeted unit tests covering retry success, retry exhaustion → dead-letter, and dead-letter write failure handling.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/billing/services/billing.webhook.service.js | Adds retry/backoff around PI metadata patch and records exhausted failures to a backfill DLQ with error logging. |
| modules/billing/lib/billing.retry.js | Introduces a shared retryWithBackoff helper used by the webhook backfill. |
| modules/billing/repositories/billing.failedBackfill.repository.js | Adds a minimal repository abstraction for recording backfill dead-letter entries. |
| modules/billing/models/billing.failedBackfill.model.mongoose.js | Defines the BillingFailedBackfill Mongoose model and unresolved-only index for ops queries. |
| modules/billing/tests/billing.refund-correlation.unit.tests.js | Adds unit coverage for retry behavior and DLQ escalation/robustness. |
| modules/billing/RUNBOOKS.md | Adds a new runbook section for manual remediation of backfill failures. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modules/billing/lib/billing.retry.js`:
- Around line 17-30: The retryWithBackoff function should validate its options
before the loop: check that attempts is a finite positive integer (>=1) and
baseMs is a finite non-negative number; if not, throw a clear TypeError
describing the invalid option(s). Add this validation at the start of
retryWithBackoff (before using attempts or baseMs) so you never enter the loop
with attempts <= 0 or non-integer values and avoid throwing undefined via
lastErr; reference the retryWithBackoff function and the attempts/baseMs
parameters when implementing the checks.
In `@modules/billing/tests/billing.refund-correlation.unit.tests.js`:
- Around line 26-31: Add a JSDoc header above the standalone makeSession helper:
include a one-line description, a `@returns` tag describing the returned session
object shape (id, payment_status, payment_intent, metadata with organizationId,
packId, kind) and types (e.g., `@returns` {Object}), and if you prefer to be
explicit add an empty `@param` comment noting no parameters; attach this header
immediately above the makeSession function declaration.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ed6dfd15-0206-4beb-806d-e58f3fd851ce
📒 Files selected for processing (6)
modules/billing/RUNBOOKS.mdmodules/billing/lib/billing.retry.jsmodules/billing/models/billing.failedBackfill.model.mongoose.jsmodules/billing/repositories/billing.failedBackfill.repository.jsmodules/billing/services/billing.webhook.service.jsmodules/billing/tests/billing.refund-correlation.unit.tests.js
- guard attempts (positive integer) + baseMs (non-negative finite) with TypeError — prevents entering loop with attempts<=0 (would throw undefined) - JSDoc header on makeSession test helper Addresses CodeRabbit re-review on PR #3690.
Stale: this CHANGES_REQUESTED targets c39b9c3 (06:51:16Z). Both findings — (1) retryWithBackoff attempts/baseMs validation, (2) makeSession JSDoc — fixed in dbb9c25 (06:53:34Z, postdates this review) + 2 guard tests added, 1484/1484 unit tests green. CR's incremental engine did not auto-re-review the new commit. Dismissing per stale-review convention.
…ill logs Fixes two logger.error calls (lines 332 and 344 of billing.webhook.service.js): - rename sessionId key to stripeSessionId for consistency with the rest of the file - add stack: err?.stack to both meta objects, matching surrounding logger.error style Also updates billing.refund-correlation unit test header comment to reference BillingFailedBackfillRepository.record instead of the stale BillingFailedBackfill.create.
…correlation-retry # Conflicts: # modules/billing/RUNBOOKS.md
…errors (#3697) * fix(billing): short-circuit retryWithBackoff on non-transient Stripe errors retryWithBackoff retried every error type, wasting ~600ms of backoff on deterministic StripeInvalidRequestError before dead-lettering. Add an optional shouldRetry(err) predicate (default: always retry) and pass a non-transient classifier at the PI-backfill call site. The err.stack finding from the issue is already satisfied by #3690 (both logger.error calls include stack) — no change needed. Closes #3691 * test(billing): cover StripeInvalidRequestError class-name branch in retry short-circuit The shouldRetry predicate guards both Stripe error type spellings (StripeInvalidRequestError class name + invalid_request_error raw type, which vary by stripe-node version). Add a parallel short-circuit test for the class-name value and align the existing test's predicate with the production one. Closes a coverage gap flagged by the pre-push gate. * fix(billing): address reviewer nits — narrow comment + improve log wording - Narrow shouldRetry JSDoc comment to only mention invalid_request_error (not auth errors) — matches what the predicate actually checks (Copilot thread PRRT_kwDOBss37M6EM5HL) - Change failure log from "failed after retries" to "failed (retries exhausted or skipped)" to reflect short-circuit path (CodeRabbit thread PRRT_kwDOBss37M6EM5c1) - Update billing.refund-correlation tests to match new log wording
Summary
checkout.session.completedhandler withretryWithBackoff(3 attempts, 200ms + 400ms exp delays)BillingFailedBackfilldead-letter collection (via minimal repository abstraction)Context — audit P1 (2026-05-21)
Devkit audit flagged
billing.webhook.service.js:314-330as P1 : the metadata backfill failed silently withlogger.warn+ continue. Later refund webhook fell into "unresolved" path → customer credit never applied, runbook-only recovery. This PR adds defense in depth.Strategy : Option B (new model + repository, no existing dead-letter mechanism for this concern). PR #3603's ProcessedStripeEvent dead-letter covers event-delivery failures (exhausted webhook retries), which is a different class.
Reviews completed before push :
billing.webhook.checkout.unit.tests.js:286)c39b9c3fTest plan
NODE_ENV=devkit npm run lint— cleanNODE_ENV=test npm run test:unit -- billing.refund-correlation— 3/3 passNODE_ENV=test npm run test:unit -- billing.webhook— 1530/1530 passFollowup
billing.webhook.checkout.unit.tests.js:286(pre-existing test, now slow due to retry — out of scope here)Plan
docs/superpowers/plans/2026-05-21-devkit-audit-p0-p1-fixes.mdPhase 2.Summary by CodeRabbit
New Features
Documentation
Tests