Skip to content

Add Auth0 client_credentials auth to simulation gateway calls#3471

Merged
MaxGhenis merged 2 commits intomasterfrom
add-gateway-auth-client
Apr 18, 2026
Merged

Add Auth0 client_credentials auth to simulation gateway calls#3471
MaxGhenis merged 2 commits intomasterfrom
add-gateway-auth-client

Conversation

@MaxGhenis
Copy link
Copy Markdown
Collaborator

Summary

The simulation gateway now requires a bearer JWT on every write and job-status endpoint (policyengine-api-v2 PR #458). Without this change, the next prod deploy of policyengine-api-v2 would break every society-wide report: v1 calls the gateway unauthenticated and would start getting 403s. This PR adds the client side so both can ship safely.

Changes

New modulepolicyengine_api/libs/gateway_auth.py

  • GatewayAuthTokenProvider fetches a client_credentials access token from Auth0, caches it in-process, and refreshes 60 s before expiry. Thread-safe (threading.Lock) so worker processes can share a single instance.
  • GatewayBearerAuth is a small httpx.Auth adapter that attaches Authorization: Bearer <token> to every request.
  • GatewayAuthError for clear failure modes (misconfigured, Auth0 HTTP error, missing access_token).

Client wiringpolicyengine_api/libs/simulation_api_modal.py

  • SimulationAPIModal.__init__ now constructs a GatewayAuthTokenProvider and, when all four GATEWAY_AUTH_* env vars are present, passes a GatewayBearerAuth to httpx.Client(auth=...). Missing vars leave auth off (preserves local/dev behavior against a gateway running with GATEWAY_AUTH_DISABLED=1).

Deploy plumbing.github/workflows/push.yml, gcp/export.py, gcp/policyengine_api/Dockerfile

  • Follows the established pattern for other secrets: GH Action secret → env var on make deployexport.py substitutes literal placeholders in the Dockerfile → App Engine runtime reads them as env vars. Four new vars: GATEWAY_AUTH_ISSUER, GATEWAY_AUTH_AUDIENCE, GATEWAY_AUTH_CLIENT_ID, GATEWAY_AUTH_CLIENT_SECRET.

Teststests/unit/libs/test_gateway_auth.py (new), tests/unit/libs/test_simulation_api_modal.py

  • 12 new tests covering configured, first-call fetch, caching, expiry refresh, trailing-slash normalization, Auth0 HTTP errors, missing-access-token, invalidate(), and the bearer header attachment.
  • 2 new tests on SimulationAPIModal.__init__ asserting the auth kwarg is wired correctly when env vars are present / absent.

Test plan

  • make test green (includes existing and new unit tests)
  • Deploy to staging/beta GAE (or verify against beta policyengine-api-v2 gateway locally with SIMULATION_API_URL pointed at beta)
  • Trigger a society-wide report end-to-end: submit → poll → result
  • Confirm no 403/401 on gateway calls; confirm token fetch happens once per process then reuses from cache

Order of operations

This PR is safe to merge before the policyengine-api-v2 gateway actually enforces auth against real prod traffic, because:

  • If the four env vars are unset in the deploy env, SimulationAPIModal attaches no auth and behavior is unchanged.
  • If they are set, the gateway currently ignores unauthenticated requests (gating is deployed but prod hasn't deployed since Update PolicyEngine US to 0.324.0 #458 merged and CI's been stuck).

After policyengine-api-v2's deploy pipeline unsticks (tracked in PolicyEngine/policyengine-api-v2#461), set the four GATEWAY_AUTH_* GH Action secrets on this repo and deploy.

🤖 Generated with Claude Code

MaxGhenis and others added 2 commits April 17, 2026 20:51
The simulation gateway now requires a bearer JWT on every write and
job-status endpoint (policyengine-api-v2 PR #458). Without this change,
the next prod deploy of policyengine-api-v2 would break every
society-wide report: v1 calls the gateway unauthenticated and would
start getting 403s.

Introduce a GatewayAuthTokenProvider that fetches a client_credentials
access token from Auth0, caches it in-process, and refreshes a minute
before expiry. The provider is thread-safe so the existing worker
processes can share a single instance. A GatewayBearerAuth adapter
attaches the token to every httpx request as Authorization: Bearer.

SimulationAPIModal wires the auth up in __init__ only when the four
GATEWAY_AUTH_* env vars are all present, so local/dev runs against a
gateway that has GATEWAY_AUTH_DISABLED=1 continue to work without
changes.

The deploy pipeline now plumbs the four env vars through the standard
pattern (push.yml -> make deploy -> gcp/export.py -> Dockerfile
substitution) so App Engine receives them at runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes from subagent review of PR #3471:

- Retry on 401: GatewayBearerAuth.auth_flow now yields twice, invalidating
  the cached token and refetching once if the gateway rejects the first
  request. Previously invalidate() existed but was never called, so a
  stale token after an Auth0 rotation would surface as a hard failure
  mid-report.

- Clamp expires_in: refuse to accept missing expires_in and clamp any
  value below 2 * refresh margin. A pathological short/zero value would
  otherwise drive the refresh-before-expiry check into perpetual
  refetching under concurrent load and DOS Auth0.

- Wrap network errors: httpx.RequestError from the token fetch now
  surfaces as GatewayAuthError so all failure modes honor the module's
  documented contract.

- Partial config is now a startup error: a new
  _require_all_or_none_gateway_auth_env() helper refuses to let the
  client construct if the four env vars are partially set. A typo in
  one GH secret name would otherwise silently downgrade to unauth'd
  calls, which is the exact scenario this module exists to prevent.

- Log a WARNING when initialising without auth, so "we shipped v1
  without the secrets set" shows up in observability instead of only
  surfacing as 403s from the gateway.

New tests cover: 401-retry flow, 2xx no-retry, network-error wrapping,
missing-expires_in, zero-expires_in clamp, 20-thread concurrent fetch
(single call), all-none / all-set / partial-set env validation. Also
switched the SimulationAPIModal env-var tests to monkeypatch for
isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit ddf4649 into master Apr 18, 2026
4 of 5 checks passed
MaxGhenis added a commit that referenced this pull request Apr 18, 2026
MaxGhenis added a commit that referenced this pull request Apr 18, 2026
The #3471 revert removed the only pending fragment, so towncrier now
exits non-zero ("No changelog fragments found") on every push and the
Deploy API + Docker jobs stay skipped. Ship a no-op fragment so the
next push triggers a clean versioning + deploy cycle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MaxGhenis added a commit that referenced this pull request Apr 18, 2026
The #3471 revert removed the only pending fragment, so towncrier now
exits non-zero ("No changelog fragments found") on every push and the
Deploy API + Docker jobs stay skipped. Ship a no-op fragment so the
next push triggers a clean versioning + deploy cycle.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis deleted the add-gateway-auth-client branch April 19, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant