fix(authz): skip create-check/grant on unauthenticated agent self-register#292
Conversation
…ister /agents/register is whitelisted, so deployed pods self-register on startup with no principal context. #270 added an enforced check(create)/grant on that path, which forwarded a None principal to agentex-auth -> _validate_principal raised 422 -> agent pods stuck in CrashLoopBackOff on startup. The agent already exists and is owned from build time (register-build runs before deploy-time register), so the create check and ownership grant don't apply on the self-registration path. Skip them when no creator is resolvable; an authenticated caller is still enforced.
| if isinstance(principal_context, dict): | ||
| return bool( | ||
| principal_context.get("user_id") | ||
| or principal_context.get("service_account_id") | ||
| ) | ||
| return bool( | ||
| getattr(principal_context, "user_id", None) | ||
| or getattr(principal_context, "service_account_id", None) | ||
| ) |
There was a problem hiding this comment.
getattr branch is never exercised with a truthy result in tests
_has_resolvable_creator has two branches: a dict path (lines 64–68) and a getattr path (lines 69–72). The tests cover None → getattr → False and {"user_id": "u"} → dict → True, but there is no test exercising getattr(..., "user_id") → True (i.e., a non-dict object with a user_id attribute). Since AgentexAuthPrincipalContext = Any and the production path may return a Pydantic model or another non-dict object from the auth gateway, the branch that would actually enforce authz for service-account-authenticated callers in production is untested. A test passing a MagicMock with user_id = "sa-1" set would close this gap.
Prompt To Fix With AI
This is a comment left during a code review.
Path: agentex/src/api/routes/agents.py
Line: 64-72
Comment:
**`getattr` branch is never exercised with a truthy result in tests**
`_has_resolvable_creator` has two branches: a `dict` path (lines 64–68) and a `getattr` path (lines 69–72). The tests cover `None → getattr → False` and `{"user_id": "u"} → dict → True`, but there is no test exercising `getattr(..., "user_id") → True` (i.e., a non-dict object with a `user_id` attribute). Since `AgentexAuthPrincipalContext = Any` and the production path may return a Pydantic model or another non-dict object from the auth gateway, the branch that would actually enforce authz for service-account-authenticated callers in production is untested. A test passing a `MagicMock` with `user_id = "sa-1"` set would close this gap.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
The legacy-authz integration test mocks agentex-auth, so it asserted the None-principal check/grant the real service 422s on -- masking the deployed-pod crashloop. Align the register/create portion: the whitelisted self-registration path forwards no check/grant (the agent is already owned from build time); the delete portion (authenticated) is unchanged. Unit tests cover the route guard directly: no/empty/creatorless principal skips check+grant (still returns the agent, no 422), while a dict- or object-shaped principal with a user/service id enforces both.
## Summary - Restore legacy `/agents/register` compatibility by accepting `principal_context` in the register request body. - Use the request-state principal when present, otherwise fall back to the body principal for create-check and ownership grant. - Keep the #292 behavior where register skips authz when no resolvable principal is provided, avoiding the unauthenticated self-register crashloop. - Regenerate the Agentex OpenAPI spec. ## Validation - `uv run pytest agentex/tests/unit/api/test_agents_authz.py -q` - `uv run ruff check agentex/src/api/routes/agents.py agentex/src/api/schemas/agents.py agentex/tests/unit/api/test_agents_authz.py` - `uv run ruff format --check agentex/src/api/routes/agents.py agentex/src/api/schemas/agents.py agentex/tests/unit/api/test_agents_authz.py` - `make gen-openapi`
Summary
Deployed agent pods were stuck in
CrashLoopBackOffon startup, 422-ingon
POST /agents/register./agents/registeris whitelisted (deployed pods self-register on startupwithout a user login), so the auth middleware leaves
principal_contextas
None. #270 added an enforcedauthorization_service.check(create)andgrant()toregister_agent; with no principal these forwardNonetoagentex-auth, whose
_validate_principalraises 422 → the pod crashes.Fix
Skip the create check and ownership grant in
register_agentwhen no creatoris resolvable from the principal context. By deploy time the agent already
exists and is owned from build time (
register-buildruns beforedeploy-time register), so neither applies on the self-registration path. An
authenticated caller (resolvable principal) is still enforced.
Why not the alternatives
/agents/register: pods self-register without a user login,so they'd then fail auth.
Noneprincipal: weakens authz globally; thenull only legitimately arises on this whitelisted self-reg path.
This mirrors the existing "no creator resolvable → skip" pattern already used
for the Spark resource registration in
_register_in_auth.Scope
Companion #291 fixed the analogous 422 for
/agents/register-build(byun-whitelisting it so it authenticates); this PR covers the deployed-pod
/agents/registerpath, which #291 intentionally left whitelisted.Testing
Unit tests (
test_agents_authz.py): aNone-principal register skipscheck/grant and still returns the agent (no 422); an authenticated caller
still triggers check + grant. Full
test_agents_authz.pysuite passes.Greptile Summary
This PR fixes a
CrashLoopBackOffon deployed agent pods by skipping theauthorization_service.check(create)andgrant()calls inregister_agentwhen no principal is resolvable from the context. Since/agents/registeris whitelisted and pods self-register without a user login, forwarding aNoneprincipal to agentex-auth was raising a 422 — this PR introduces_has_resolvable_creatorto gate those calls only for authenticated callers._has_resolvable_creator(principal_context)helper that checks foruser_id/service_account_idon both dict-shaped and attribute-shaped principals.check/grantcalls inregister_agentbehindenforce_ownership, leaving them untouched for authenticated callers.Confidence Score: 5/5
Safe to merge — the change is surgical, well-documented, and mirrors an existing pattern; both the unauthenticated self-register path and the authenticated path are covered by new unit tests.
The fix is narrowly scoped to gating two calls behind a well-tested helper. The helper correctly handles all principal shapes. The previously flagged untested getattr branch is now covered by test_object_principal_enforces_check_and_grant. No existing authz enforcement for authenticated callers is weakened.
No files require special attention.
Important Files Changed
_has_resolvable_creatorhelper and gatescheck/grantinregister_agentbehind it. Logic is correct and well-documented; both dict and attribute principal shapes are handled; the unauthenticated path correctly skips auth calls.TestRegisterAgentOwnershipEnforcementclass covers all three paths: None/empty/no-matching-key principal skips check+grant; dict principal with user_id enforces; SimpleNamespace principal with user_id exercises the getattr branch and enforces — addressing the previously untested branch.Reviews (2): Last reviewed commit: "test(authz): align register authz tests ..." | Re-trigger Greptile