Skip to content

feat(x402): buyer-side settlement, HoldSign/ReleaseSpend auth lifecycle#343

Open
HananINouman wants to merge 7 commits intomainfrom
feat/x402-buyer-settlement-hold-sign
Open

feat(x402): buyer-side settlement, HoldSign/ReleaseSpend auth lifecycle#343
HananINouman wants to merge 7 commits intomainfrom
feat/x402-buyer-settlement-hold-sign

Conversation

@HananINouman
Copy link
Copy Markdown

@HananINouman HananINouman commented Apr 15, 2026

ForwardAuth hardening

Fixed ForwardAuth facilitator payload format (internal/x402/forwardauth.go) — sends decoded JSON, not raw base64. Standardised verifyOnly: true cluster-wide (x402.yaml, setup.go). This is permanent: settling inside ForwardAuth fires before Traefik forwards to the upstream, so it would debit the payer before confirming the upstream served the request.

Buyer sidecar — HoldSign/ReleaseSpend auth lifecycle

internal/x402/buyer/signer.go now uses a hold-confirm-release flow to prevent auth drain on failure:

HoldSign — removes auth from pool without persisting the nonce
ConfirmSpend — persists nonce consumed only after upstream returns < 400
ReleaseSpend — returns auth to pool on upstream failure

The sidecar reads X-PAYMENT-RESPONSE as optional metadata but does not call the facilitator directly. Settlement is the seller's responsibility.

CA bundle fix

obol stack up and obol sell http now populate the ca-certificates ConfigMap in the x402 namespace. The distroless verifier has no CA store — without this, all payment verification fails with x509: certificate signed by unknown authority.

Linux RC5 bug fixes

Port detection: /proc/net/tcp fallback for privileged ports where net.Listen returns permission denied
dockerBridgeGatewayIP: FlagUp check + br-* fallback when docker0 is down
erc8004 ChainID: changed from string → int64 (Rust remote-signer expects u64)
obolup.sh: /dev/tty openability guard before interactive prompt

Paid route /v1 fix

api_base in litellm-config and addLiteLLMModelEntry must be http://127.0.0.1:8402/v1. LiteLLM's OpenAI provider does not append /v1 — without it the buyer's mux returns 404 page not found.

Known limitations

LiteLLM restart required after buy.py buy
subPath mount means ConfigMap updates never reach a running pod. Run obol kubectl rollout restart deployment/litellm -n llm after every purchase.

Direct X-PAYMENT through Traefik
Not a supported production path — ForwardAuth verifies but does not settle. Use buy.py buy + paid/ via LiteLLM.

LiteLLM replicas > 1
Auth pool is pod-local. Do not scale until state is shared.

## Buyer sidecar — auth drain fix (signer.go)

Splits the old single Sign() call into a three-phase lifecycle to prevent
auth pool drain when a paid request fails after the auth has been popped:

- HoldSign: pops an auth from the pool and builds the payment payload but
  does NOT mark it consumed (onConsume is not called).
- ConfirmSpend: persists the nonce as consumed only after upstream response
  succeeds AND facilitator settlement succeeds.
- ReleaseSpend: returns the held auth to the front of the pool and
  decrements the spent counter — used on upstream ≥400 or settlement
  failure so the auth can be retried.
- Sign() is kept as a convenience wrapper: HoldSign → ConfirmSpend.

## Buyer sidecar — direct facilitator settlement (settlement.go, proxy.go)

New settlement.go: buyer calls facilitator /settle directly when the
sell-side ForwardAuth is running verifyOnly=true and therefore returns no
X-PAYMENT-RESPONSE header.

- facilitatorSettle(): marshals PaymentPayload + PaymentRequirements, POSTs
  to <facilitatorURL>/settle, parses SettlementResponse.
- Uses a local defaultFacilitatorURL constant instead of importing
  internal/x402 to avoid an import cycle (x402 test files import buyer).

proxy.go changes:
- FacilitatorURL added to replayableX402Transport (sourced from per-upstream
  UpstreamConfig).
- selectAndHoldPayment(): replaces the old SelectAndSign path for
  PreSignedSigner — calls HoldSign instead of Sign.
- releaseHeldPreSignedSpend(): returns auth to pool on failure.
- ensureSettlement(): checks X-PAYMENT-RESPONSE first; if absent (verifyOnly
  sell side) calls facilitatorSettle directly.
- newErrorResponse(): synthetic 503 returned to LiteLLM on settlement failure.
- Full RoundTrip flow: first req → 402 → HoldSign → retry with X-PAYMENT →
  upstream ≥400: release → settlement fails: release + 503 → settlement
  succeeds: ConfirmSpend → return response.

## Buyer sidecar — FacilitatorURL per upstream (config.go)

Added FacilitatorURL string (json:"facilitatorURL,omitempty") to
UpstreamConfig so each upstream can override the facilitator. Falls back
to defaultFacilitatorURL when empty.

## ServiceOffer controller — propagate facilitatorURL (purchase.go, purchase_helpers.go)

reconcilePurchaseConfigure now writes "facilitatorURL": x402pkg.DefaultFacilitatorURL
into the upstream map stored in the buyer ConfigMap so the sidecar knows
which facilitator to call for settlement.

addLiteLLMModelEntry (purchase_helpers.go) now hardcodes api_base as
http://127.0.0.1:8402/v1 (with /v1 suffix) for all explicit paid/* entries
it writes — ensures LiteLLM routes to the correct path even on clusters
that predate the template fix.

## Infrastructure template fixes (llm.yaml, x402.yaml)

llm.yaml: fixed api_base for the paid/* wildcard route from
http://127.0.0.1:8402 to http://127.0.0.1:8402/v1 — missing /v1 caused
LiteLLM to receive 404s from the buyer sidecar.

x402.yaml: changed verifyOnly default from false to true. The ForwardAuth
verifier must only verify payments; settlement on the auth hop breaks
Traefik and the facilitator rejects base64-encoded payloads. Settlement is
now the buyer sidecar's responsibility. Also added optional: true to CA
cert volume mount and RBAC for litellm-secrets so the controller can
hot-add paid models.

## ForwardAuth verifier hardening (forwardauth.go)

Added validation in facilitatorVerify and facilitatorSettle: rejects
empty or non-JSON paymentPayload before sending to facilitator. Facilitators
expect a JSON object; sending a base64 string causes unsupported_scheme.
Renamed parameter payloadBytes → paymentPayloadJSON for clarity.

## Standalone inference gateway (gateway.go)

Added NoPaymentGate bool to GatewayConfig — disables the x402 payment
middleware when the gateway runs behind the cluster's x402 verifier (avoids
double-gating). Added ValidateFacilitatorURL check on startup. Added
normalizeServicePrefixedPath() to strip /services/<name>/ prefix so
requests reach the correct OpenAI routes.

## Test fixes (proxy_test.go, buy_side_test.go, signer_test.go, forwardauth_test.go)

All tests updated for the new settlement flow. Mock upstream 200 responses
now set X-PAYMENT-RESPONSE so ensureSettlement() short-circuits via the
header path instead of calling the real facilitator over the network.
Added X-PAYMENT-RESPONSE to startMockX402Seller in buy_side_test.go.
TestLLMTemplate_IncludesPaidRouteAndBuyerSidecar updated to assert
api_base includes /v1 (stack_test.go).
flow-06-sell-setup.sh assertion updated to expect verifyOnly: true.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@HananINouman HananINouman marked this pull request as draft April 15, 2026 14:35
…ell http

The x402-verifier image is distroless (no CA store). Without the
ca-certificates ConfigMap populated from the host, the verifier cannot
TLS-verify calls to https://x402.gcp.obol.tech, causing all paid
requests to fail with "Payment verification failed" (x509: certificate
signed by unknown authority).

Root cause: `obol sell http` never called Setup()/populateCABundle().
Only `obol sell pricing` triggered it, so running sell http on a fresh
cluster left the CA bundle empty.

Fix:
- internal/x402/setup.go: call populateCABundle inside EnsureVerifier()
  so it runs whenever the verifier is deployed; export PopulateCABundle
  for callers that bypass EnsureVerifier
- internal/stack/stack.go: call x402verifier.PopulateCABundle after
  infrastructure deployment during `obol stack up`
- cmd/obol/sell.go: call x402verifier.PopulateCABundle in sellHTTPCommand
  before creating the ServiceOffer CR

Also updates CLAUDE.md with pitfall #7 (LiteLLM hot-add broken with
subPath mount), pitfall #8 (CA bundle), sell http flag reference, direct
HTTP buy format notes, and buy.py command reference.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@OisinKyne OisinKyne requested a review from bussyjd April 15, 2026 21:45
HananINouman and others added 3 commits April 16, 2026 17:36
x402 / buyer sidecar:
- Revert direct-facilitator settlement from proxy RoundTrip; settlement
  stays as X-PAYMENT-RESPONSE header read (matching main semantics)
- Remove now-dead settlement.go and FacilitatorURL from UpstreamConfig
- Keep HoldSign/ReleaseSpend auth lifecycle in PreSignedSigner
- verifyOnly: true permanent for Traefik ForwardAuth path
- api_base for paid/* LiteLLM route reverted to bare :8402 (buyer proxy
  registers both /chat/completions and /v1/chat/completions)

Linux rc5 bug fixes:
- checkPortsAvailable: use /proc/net/tcp on Linux to detect privileged
  port occupancy when net.Listen returns permission denied
- dockerBridgeGatewayIP: check FlagUp before using docker0; fall back to
  active br-* bridge if docker0 is down
- erc8004 SignTxRequest: chain_id serialised as int64 (was string),
  fixing HTTP 422 from Rust remote-signer expecting u64
- obolup.sh: test /dev/tty openability before interactive prompt to
  prevent crash in non-interactive environments

Docs:
- CLAUDE.md/README: clarify direct X-PAYMENT via Traefik is dev/debug
  only (no settlement); x402-buyer and obol sell inference are the
  supported buyer paths
- Fix pitfall numbering gap (6,8 → 6,7) after removing stale /v1 note

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ider does not add it automatically

Without /v1 in api_base, LiteLLM calls /chat/completions on the buyer
sidecar. The buyer's mux returns Go's default 404 "page not found" which
LiteLLM surfaces as OpenAIException - 404 page not found. The /v1 suffix
is required in the static litellm-config ConfigMap template and in the
controller's addLiteLLMModelEntry helper. Verified end-to-end: paid/qwen3.5:4b
returns HTTP 200, sidecar shows remaining=2 spent=1 after one call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The subPath mount means ConfigMap updates NEVER propagate to a running
LiteLLM pod. The restart step after buy.py buy is mandatory, not optional.
Expand pitfall #7 with symptom, required command, and reason so Claude Code
stops suggesting "wait for ConfigMap propagation" as a workaround.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@HananINouman HananINouman marked this pull request as ready for review April 16, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant