Skip to content

fix(authz): skip create-check/grant on unauthenticated agent self-register#292

Merged
harvhan merged 2 commits into
mainfrom
harvhan/fix-register-null-principal
Jun 9, 2026
Merged

fix(authz): skip create-check/grant on unauthenticated agent self-register#292
harvhan merged 2 commits into
mainfrom
harvhan/fix-register-null-principal

Conversation

@harvhan

@harvhan harvhan commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Deployed agent pods were stuck in CrashLoopBackOff on startup, 422-ing
on POST /agents/register.

/agents/register is whitelisted (deployed pods self-register on startup
without a user login), so the auth middleware leaves principal_context
as None. #270 added an enforced authorization_service.check(create) and
grant() to register_agent; with no principal these forward None to
agentex-auth, whose _validate_principal raises 422 → the pod crashes.

Fix

Skip the create check and ownership grant in register_agent when no creator
is resolvable from the principal context. By deploy time the agent already
exists and is owned from build time (register-build runs before
deploy-time register), so neither applies on the self-registration path. An
authenticated caller (resolvable principal) is still enforced.

Why not the alternatives

  • Un-whitelist /agents/register: pods self-register without a user login,
    so they'd then fail auth.
  • Make agentex-auth tolerate a None principal: weakens authz globally; the
    null only legitimately arises on this whitelisted self-reg path.

This mirrors the existing "no creator resolvable → skip" pattern already used
for the Spark resource registration in _register_in_auth.

Scope

Companion #291 fixed the analogous 422 for /agents/register-build (by
un-whitelisting it so it authenticates); this PR covers the deployed-pod
/agents/register
path, which #291 intentionally left whitelisted.

Testing

Unit tests (test_agents_authz.py): a None-principal register skips
check/grant and still returns the agent (no 422); an authenticated caller
still triggers check + grant. Full test_agents_authz.py suite passes.

Greptile Summary

This PR fixes a CrashLoopBackOff on deployed agent pods by skipping the authorization_service.check(create) and grant() calls in register_agent when no principal is resolvable from the context. Since /agents/register is whitelisted and pods self-register without a user login, forwarding a None principal to agentex-auth was raising a 422 — this PR introduces _has_resolvable_creator to gate those calls only for authenticated callers.

  • Adds _has_resolvable_creator(principal_context) helper that checks for user_id/service_account_id on both dict-shaped and attribute-shaped principals.
  • Wraps the check/grant calls in register_agent behind enforce_ownership, leaving them untouched for authenticated callers.
  • Adds unit tests covering all three cases (None/empty principal, dict principal, object principal) and updates the integration test to reflect that the unauthenticated self-register path no longer triggers authz calls.

Confidence Score: 5/5

Safe to merge — the change is surgical, well-documented, and mirrors an existing pattern; both the unauthenticated self-register path and the authenticated path are covered by new unit tests.

The fix is narrowly scoped to gating two calls behind a well-tested helper. The helper correctly handles all principal shapes. The previously flagged untested getattr branch is now covered by test_object_principal_enforces_check_and_grant. No existing authz enforcement for authenticated callers is weakened.

No files require special attention.

Important Files Changed

Filename Overview
agentex/src/api/routes/agents.py Adds _has_resolvable_creator helper and gates check/grant in register_agent behind it. Logic is correct and well-documented; both dict and attribute principal shapes are handled; the unauthenticated path correctly skips auth calls.
agentex/tests/unit/api/test_agents_authz.py New TestRegisterAgentOwnershipEnforcement class covers all three paths: None/empty/no-matching-key principal skips check+grant; dict principal with user_id enforces; SimpleNamespace principal with user_id exercises the getattr branch and enforces — addressing the previously untested branch.
agentex/tests/integration/api/agents/test_agents_auth_api.py Renames and updates the integration test to assert no authz calls on the unauthenticated register path, correctly reflecting the new behavior. Delete-gate assertions remain unchanged.

Reviews (2): Last reviewed commit: "test(authz): align register authz tests ..." | Re-trigger Greptile

…ister

/agents/register is whitelisted, so deployed pods self-register on
startup with no principal context. #270 added an enforced
check(create)/grant on that path, which forwarded a None principal to
agentex-auth -> _validate_principal raised 422 -> agent pods stuck in
CrashLoopBackOff on startup.

The agent already exists and is owned from build time (register-build
runs before deploy-time register), so the create check and ownership
grant don't apply on the self-registration path. Skip them when no
creator is resolvable; an authenticated caller is still enforced.
@harvhan harvhan requested a review from a team as a code owner June 9, 2026 16:50
Comment on lines +64 to +72
if isinstance(principal_context, dict):
return bool(
principal_context.get("user_id")
or principal_context.get("service_account_id")
)
return bool(
getattr(principal_context, "user_id", None)
or getattr(principal_context, "service_account_id", None)
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 getattr branch is never exercised with a truthy result in tests

_has_resolvable_creator has two branches: a dict path (lines 64–68) and a getattr path (lines 69–72). The tests cover None → getattr → False and {"user_id": "u"} → dict → True, but there is no test exercising getattr(..., "user_id") → True (i.e., a non-dict object with a user_id attribute). Since AgentexAuthPrincipalContext = Any and the production path may return a Pydantic model or another non-dict object from the auth gateway, the branch that would actually enforce authz for service-account-authenticated callers in production is untested. A test passing a MagicMock with user_id = "sa-1" set would close this gap.

Prompt To Fix With AI
This is a comment left during a code review.
Path: agentex/src/api/routes/agents.py
Line: 64-72

Comment:
**`getattr` branch is never exercised with a truthy result in tests**

`_has_resolvable_creator` has two branches: a `dict` path (lines 64–68) and a `getattr` path (lines 69–72). The tests cover `None → getattr → False` and `{"user_id": "u"} → dict → True`, but there is no test exercising `getattr(..., "user_id") → True` (i.e., a non-dict object with a `user_id` attribute). Since `AgentexAuthPrincipalContext = Any` and the production path may return a Pydantic model or another non-dict object from the auth gateway, the branch that would actually enforce authz for service-account-authenticated callers in production is untested. A test passing a `MagicMock` with `user_id = "sa-1"` set would close this gap.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Cursor Fix in Claude Code Fix in Codex

The legacy-authz integration test mocks agentex-auth, so it asserted the
None-principal check/grant the real service 422s on -- masking the
deployed-pod crashloop. Align the register/create portion: the
whitelisted self-registration path forwards no check/grant (the agent is
already owned from build time); the delete portion (authenticated) is
unchanged.

Unit tests cover the route guard directly: no/empty/creatorless
principal skips check+grant (still returns the agent, no 422), while a
dict- or object-shaped principal with a user/service id enforces both.
@harvhan harvhan merged commit 63f89e7 into main Jun 9, 2026
30 checks passed
@harvhan harvhan deleted the harvhan/fix-register-null-principal branch June 9, 2026 17:13
rpatel-scale added a commit that referenced this pull request Jun 18, 2026
## Summary

- Restore legacy `/agents/register` compatibility by accepting
`principal_context` in the register request body.
- Use the request-state principal when present, otherwise fall back to
the body principal for create-check and ownership grant.
- Keep the #292 behavior where register skips authz when no resolvable
principal is provided, avoiding the unauthenticated self-register
crashloop.
- Regenerate the Agentex OpenAPI spec.

## Validation

- `uv run pytest agentex/tests/unit/api/test_agents_authz.py -q`
- `uv run ruff check agentex/src/api/routes/agents.py
agentex/src/api/schemas/agents.py
agentex/tests/unit/api/test_agents_authz.py`
- `uv run ruff format --check agentex/src/api/routes/agents.py
agentex/src/api/schemas/agents.py
agentex/tests/unit/api/test_agents_authz.py`
- `make gen-openapi`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants