Add Auth0 client_credentials auth to simulation gateway calls#3471
Merged
Add Auth0 client_credentials auth to simulation gateway calls#3471
Conversation
The simulation gateway now requires a bearer JWT on every write and job-status endpoint (policyengine-api-v2 PR #458). Without this change, the next prod deploy of policyengine-api-v2 would break every society-wide report: v1 calls the gateway unauthenticated and would start getting 403s. Introduce a GatewayAuthTokenProvider that fetches a client_credentials access token from Auth0, caches it in-process, and refreshes a minute before expiry. The provider is thread-safe so the existing worker processes can share a single instance. A GatewayBearerAuth adapter attaches the token to every httpx request as Authorization: Bearer. SimulationAPIModal wires the auth up in __init__ only when the four GATEWAY_AUTH_* env vars are all present, so local/dev runs against a gateway that has GATEWAY_AUTH_DISABLED=1 continue to work without changes. The deploy pipeline now plumbs the four env vars through the standard pattern (push.yml -> make deploy -> gcp/export.py -> Dockerfile substitution) so App Engine receives them at runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes from subagent review of PR #3471: - Retry on 401: GatewayBearerAuth.auth_flow now yields twice, invalidating the cached token and refetching once if the gateway rejects the first request. Previously invalidate() existed but was never called, so a stale token after an Auth0 rotation would surface as a hard failure mid-report. - Clamp expires_in: refuse to accept missing expires_in and clamp any value below 2 * refresh margin. A pathological short/zero value would otherwise drive the refresh-before-expiry check into perpetual refetching under concurrent load and DOS Auth0. - Wrap network errors: httpx.RequestError from the token fetch now surfaces as GatewayAuthError so all failure modes honor the module's documented contract. - Partial config is now a startup error: a new _require_all_or_none_gateway_auth_env() helper refuses to let the client construct if the four env vars are partially set. A typo in one GH secret name would otherwise silently downgrade to unauth'd calls, which is the exact scenario this module exists to prevent. - Log a WARNING when initialising without auth, so "we shipped v1 without the secrets set" shows up in observability instead of only surfacing as 403s from the gateway. New tests cover: 401-retry flow, 2xx no-retry, network-error wrapping, missing-expires_in, zero-expires_in clamp, 20-thread concurrent fetch (single call), all-none / all-set / partial-set env validation. Also switched the SimulationAPIModal env-var tests to monkeypatch for isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MaxGhenis
added a commit
that referenced
this pull request
Apr 18, 2026
MaxGhenis
added a commit
that referenced
this pull request
Apr 18, 2026
The #3471 revert removed the only pending fragment, so towncrier now exits non-zero ("No changelog fragments found") on every push and the Deploy API + Docker jobs stay skipped. Ship a no-op fragment so the next push triggers a clean versioning + deploy cycle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MaxGhenis
added a commit
that referenced
this pull request
Apr 18, 2026
The #3471 revert removed the only pending fragment, so towncrier now exits non-zero ("No changelog fragments found") on every push and the Deploy API + Docker jobs stay skipped. Ship a no-op fragment so the next push triggers a clean versioning + deploy cycle. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The simulation gateway now requires a bearer JWT on every write and job-status endpoint (
policyengine-api-v2PR #458). Without this change, the next prod deploy ofpolicyengine-api-v2would break every society-wide report: v1 calls the gateway unauthenticated and would start getting 403s. This PR adds the client side so both can ship safely.Changes
New module —
policyengine_api/libs/gateway_auth.pyGatewayAuthTokenProviderfetches aclient_credentialsaccess token from Auth0, caches it in-process, and refreshes 60 s before expiry. Thread-safe (threading.Lock) so worker processes can share a single instance.GatewayBearerAuthis a smallhttpx.Authadapter that attachesAuthorization: Bearer <token>to every request.GatewayAuthErrorfor clear failure modes (misconfigured, Auth0 HTTP error, missingaccess_token).Client wiring —
policyengine_api/libs/simulation_api_modal.pySimulationAPIModal.__init__now constructs aGatewayAuthTokenProviderand, when all fourGATEWAY_AUTH_*env vars are present, passes aGatewayBearerAuthtohttpx.Client(auth=...). Missing vars leave auth off (preserves local/dev behavior against a gateway running withGATEWAY_AUTH_DISABLED=1).Deploy plumbing —
.github/workflows/push.yml,gcp/export.py,gcp/policyengine_api/Dockerfilemake deploy→export.pysubstitutes literal placeholders in the Dockerfile → App Engine runtime reads them as env vars. Four new vars:GATEWAY_AUTH_ISSUER,GATEWAY_AUTH_AUDIENCE,GATEWAY_AUTH_CLIENT_ID,GATEWAY_AUTH_CLIENT_SECRET.Tests —
tests/unit/libs/test_gateway_auth.py(new),tests/unit/libs/test_simulation_api_modal.pyconfigured, first-call fetch, caching, expiry refresh, trailing-slash normalization, Auth0 HTTP errors, missing-access-token,invalidate(), and the bearer header attachment.SimulationAPIModal.__init__asserting theauthkwarg is wired correctly when env vars are present / absent.Test plan
make testgreen (includes existing and new unit tests)policyengine-api-v2gateway locally withSIMULATION_API_URLpointed at beta)Order of operations
This PR is safe to merge before the
policyengine-api-v2gateway actually enforces auth against real prod traffic, because:SimulationAPIModalattaches no auth and behavior is unchanged.After
policyengine-api-v2's deploy pipeline unsticks (tracked in PolicyEngine/policyengine-api-v2#461), set the fourGATEWAY_AUTH_*GH Action secrets on this repo and deploy.🤖 Generated with Claude Code