feat(x402): buyer-side settlement, HoldSign/ReleaseSpend auth lifecycle#343
Open
HananINouman wants to merge 7 commits intomainfrom
Open
feat(x402): buyer-side settlement, HoldSign/ReleaseSpend auth lifecycle#343HananINouman wants to merge 7 commits intomainfrom
HananINouman wants to merge 7 commits intomainfrom
Conversation
## Buyer sidecar — auth drain fix (signer.go) Splits the old single Sign() call into a three-phase lifecycle to prevent auth pool drain when a paid request fails after the auth has been popped: - HoldSign: pops an auth from the pool and builds the payment payload but does NOT mark it consumed (onConsume is not called). - ConfirmSpend: persists the nonce as consumed only after upstream response succeeds AND facilitator settlement succeeds. - ReleaseSpend: returns the held auth to the front of the pool and decrements the spent counter — used on upstream ≥400 or settlement failure so the auth can be retried. - Sign() is kept as a convenience wrapper: HoldSign → ConfirmSpend. ## Buyer sidecar — direct facilitator settlement (settlement.go, proxy.go) New settlement.go: buyer calls facilitator /settle directly when the sell-side ForwardAuth is running verifyOnly=true and therefore returns no X-PAYMENT-RESPONSE header. - facilitatorSettle(): marshals PaymentPayload + PaymentRequirements, POSTs to <facilitatorURL>/settle, parses SettlementResponse. - Uses a local defaultFacilitatorURL constant instead of importing internal/x402 to avoid an import cycle (x402 test files import buyer). proxy.go changes: - FacilitatorURL added to replayableX402Transport (sourced from per-upstream UpstreamConfig). - selectAndHoldPayment(): replaces the old SelectAndSign path for PreSignedSigner — calls HoldSign instead of Sign. - releaseHeldPreSignedSpend(): returns auth to pool on failure. - ensureSettlement(): checks X-PAYMENT-RESPONSE first; if absent (verifyOnly sell side) calls facilitatorSettle directly. - newErrorResponse(): synthetic 503 returned to LiteLLM on settlement failure. - Full RoundTrip flow: first req → 402 → HoldSign → retry with X-PAYMENT → upstream ≥400: release → settlement fails: release + 503 → settlement succeeds: ConfirmSpend → return response. ## Buyer sidecar — FacilitatorURL per upstream (config.go) Added FacilitatorURL string (json:"facilitatorURL,omitempty") to UpstreamConfig so each upstream can override the facilitator. Falls back to defaultFacilitatorURL when empty. ## ServiceOffer controller — propagate facilitatorURL (purchase.go, purchase_helpers.go) reconcilePurchaseConfigure now writes "facilitatorURL": x402pkg.DefaultFacilitatorURL into the upstream map stored in the buyer ConfigMap so the sidecar knows which facilitator to call for settlement. addLiteLLMModelEntry (purchase_helpers.go) now hardcodes api_base as http://127.0.0.1:8402/v1 (with /v1 suffix) for all explicit paid/* entries it writes — ensures LiteLLM routes to the correct path even on clusters that predate the template fix. ## Infrastructure template fixes (llm.yaml, x402.yaml) llm.yaml: fixed api_base for the paid/* wildcard route from http://127.0.0.1:8402 to http://127.0.0.1:8402/v1 — missing /v1 caused LiteLLM to receive 404s from the buyer sidecar. x402.yaml: changed verifyOnly default from false to true. The ForwardAuth verifier must only verify payments; settlement on the auth hop breaks Traefik and the facilitator rejects base64-encoded payloads. Settlement is now the buyer sidecar's responsibility. Also added optional: true to CA cert volume mount and RBAC for litellm-secrets so the controller can hot-add paid models. ## ForwardAuth verifier hardening (forwardauth.go) Added validation in facilitatorVerify and facilitatorSettle: rejects empty or non-JSON paymentPayload before sending to facilitator. Facilitators expect a JSON object; sending a base64 string causes unsupported_scheme. Renamed parameter payloadBytes → paymentPayloadJSON for clarity. ## Standalone inference gateway (gateway.go) Added NoPaymentGate bool to GatewayConfig — disables the x402 payment middleware when the gateway runs behind the cluster's x402 verifier (avoids double-gating). Added ValidateFacilitatorURL check on startup. Added normalizeServicePrefixedPath() to strip /services/<name>/ prefix so requests reach the correct OpenAI routes. ## Test fixes (proxy_test.go, buy_side_test.go, signer_test.go, forwardauth_test.go) All tests updated for the new settlement flow. Mock upstream 200 responses now set X-PAYMENT-RESPONSE so ensureSettlement() short-circuits via the header path instead of calling the real facilitator over the network. Added X-PAYMENT-RESPONSE to startMockX402Seller in buy_side_test.go. TestLLMTemplate_IncludesPaidRouteAndBuyerSidecar updated to assert api_base includes /v1 (stack_test.go). flow-06-sell-setup.sh assertion updated to expect verifyOnly: true. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ell http The x402-verifier image is distroless (no CA store). Without the ca-certificates ConfigMap populated from the host, the verifier cannot TLS-verify calls to https://x402.gcp.obol.tech, causing all paid requests to fail with "Payment verification failed" (x509: certificate signed by unknown authority). Root cause: `obol sell http` never called Setup()/populateCABundle(). Only `obol sell pricing` triggered it, so running sell http on a fresh cluster left the CA bundle empty. Fix: - internal/x402/setup.go: call populateCABundle inside EnsureVerifier() so it runs whenever the verifier is deployed; export PopulateCABundle for callers that bypass EnsureVerifier - internal/stack/stack.go: call x402verifier.PopulateCABundle after infrastructure deployment during `obol stack up` - cmd/obol/sell.go: call x402verifier.PopulateCABundle in sellHTTPCommand before creating the ServiceOffer CR Also updates CLAUDE.md with pitfall #7 (LiteLLM hot-add broken with subPath mount), pitfall #8 (CA bundle), sell http flag reference, direct HTTP buy format notes, and buy.py command reference. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
x402 / buyer sidecar: - Revert direct-facilitator settlement from proxy RoundTrip; settlement stays as X-PAYMENT-RESPONSE header read (matching main semantics) - Remove now-dead settlement.go and FacilitatorURL from UpstreamConfig - Keep HoldSign/ReleaseSpend auth lifecycle in PreSignedSigner - verifyOnly: true permanent for Traefik ForwardAuth path - api_base for paid/* LiteLLM route reverted to bare :8402 (buyer proxy registers both /chat/completions and /v1/chat/completions) Linux rc5 bug fixes: - checkPortsAvailable: use /proc/net/tcp on Linux to detect privileged port occupancy when net.Listen returns permission denied - dockerBridgeGatewayIP: check FlagUp before using docker0; fall back to active br-* bridge if docker0 is down - erc8004 SignTxRequest: chain_id serialised as int64 (was string), fixing HTTP 422 from Rust remote-signer expecting u64 - obolup.sh: test /dev/tty openability before interactive prompt to prevent crash in non-interactive environments Docs: - CLAUDE.md/README: clarify direct X-PAYMENT via Traefik is dev/debug only (no settlement); x402-buyer and obol sell inference are the supported buyer paths - Fix pitfall numbering gap (6,8 → 6,7) after removing stale /v1 note Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ider does not add it automatically Without /v1 in api_base, LiteLLM calls /chat/completions on the buyer sidecar. The buyer's mux returns Go's default 404 "page not found" which LiteLLM surfaces as OpenAIException - 404 page not found. The /v1 suffix is required in the static litellm-config ConfigMap template and in the controller's addLiteLLMModelEntry helper. Verified end-to-end: paid/qwen3.5:4b returns HTTP 200, sidecar shows remaining=2 spent=1 after one call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The subPath mount means ConfigMap updates NEVER propagate to a running LiteLLM pod. The restart step after buy.py buy is mandatory, not optional. Expand pitfall #7 with symptom, required command, and reason so Claude Code stops suggesting "wait for ConfigMap propagation" as a workaround. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ForwardAuth hardening
Fixed ForwardAuth facilitator payload format (internal/x402/forwardauth.go) — sends decoded JSON, not raw base64. Standardised verifyOnly: true cluster-wide (x402.yaml, setup.go). This is permanent: settling inside ForwardAuth fires before Traefik forwards to the upstream, so it would debit the payer before confirming the upstream served the request.
Buyer sidecar — HoldSign/ReleaseSpend auth lifecycle
internal/x402/buyer/signer.go now uses a hold-confirm-release flow to prevent auth drain on failure:
HoldSign — removes auth from pool without persisting the nonce
ConfirmSpend — persists nonce consumed only after upstream returns < 400
ReleaseSpend — returns auth to pool on upstream failure
The sidecar reads X-PAYMENT-RESPONSE as optional metadata but does not call the facilitator directly. Settlement is the seller's responsibility.
CA bundle fix
obol stack up and obol sell http now populate the ca-certificates ConfigMap in the x402 namespace. The distroless verifier has no CA store — without this, all payment verification fails with x509: certificate signed by unknown authority.
Linux RC5 bug fixes
Port detection: /proc/net/tcp fallback for privileged ports where net.Listen returns permission denied
dockerBridgeGatewayIP: FlagUp check + br-* fallback when docker0 is down
erc8004 ChainID: changed from string → int64 (Rust remote-signer expects u64)
obolup.sh: /dev/tty openability guard before interactive prompt
Paid route /v1 fix
api_base in litellm-config and addLiteLLMModelEntry must be http://127.0.0.1:8402/v1. LiteLLM's OpenAI provider does not append /v1 — without it the buyer's mux returns 404 page not found.
Known limitations
LiteLLM restart required after buy.py buy
subPath mount means ConfigMap updates never reach a running pod. Run obol kubectl rollout restart deployment/litellm -n llm after every purchase.
Direct X-PAYMENT through Traefik
Not a supported production path — ForwardAuth verifies but does not settle. Use buy.py buy + paid/ via LiteLLM.
LiteLLM replicas > 1
Auth pool is pod-local. Do not scale until state is shared.