fix(authz): skip create-check/grant on unauthenticated agent self-register by harvhan · Pull Request #292 · scaleapi/scale-agentex

harvhan · 2026-06-09T16:50:24Z

Summary

Deployed agent pods were stuck in CrashLoopBackOff on startup, 422-ing
on POST /agents/register.

/agents/register is whitelisted (deployed pods self-register on startup
without a user login), so the auth middleware leaves principal_context
as None. #270 added an enforced authorization_service.check(create) and
grant() to register_agent; with no principal these forward None to
agentex-auth, whose _validate_principal raises 422 → the pod crashes.

Fix

Skip the create check and ownership grant in register_agent when no creator
is resolvable from the principal context. By deploy time the agent already
exists and is owned from build time (register-build runs before
deploy-time register), so neither applies on the self-registration path. An
authenticated caller (resolvable principal) is still enforced.

Why not the alternatives

Un-whitelist /agents/register: pods self-register without a user login,
so they'd then fail auth.
Make agentex-auth tolerate a None principal: weakens authz globally; the
null only legitimately arises on this whitelisted self-reg path.

This mirrors the existing "no creator resolvable → skip" pattern already used
for the Spark resource registration in _register_in_auth.

Scope

Companion #291 fixed the analogous 422 for /agents/register-build (by
un-whitelisting it so it authenticates); this PR covers the deployed-pod
/agents/register path, which #291 intentionally left whitelisted.

Testing

Unit tests (test_agents_authz.py): a None-principal register skips
check/grant and still returns the agent (no 422); an authenticated caller
still triggers check + grant. Full test_agents_authz.py suite passes.

Greptile Summary

This PR fixes a CrashLoopBackOff on deployed agent pods by skipping the authorization_service.check(create) and grant() calls in register_agent when no principal is resolvable from the context. Since /agents/register is whitelisted and pods self-register without a user login, forwarding a None principal to agentex-auth was raising a 422 — this PR introduces _has_resolvable_creator to gate those calls only for authenticated callers.

Adds _has_resolvable_creator(principal_context) helper that checks for user_id/service_account_id on both dict-shaped and attribute-shaped principals.
Wraps the check/grant calls in register_agent behind enforce_ownership, leaving them untouched for authenticated callers.
Adds unit tests covering all three cases (None/empty principal, dict principal, object principal) and updates the integration test to reflect that the unauthenticated self-register path no longer triggers authz calls.

Confidence Score: 5/5

Safe to merge — the change is surgical, well-documented, and mirrors an existing pattern; both the unauthenticated self-register path and the authenticated path are covered by new unit tests.

The fix is narrowly scoped to gating two calls behind a well-tested helper. The helper correctly handles all principal shapes. The previously flagged untested getattr branch is now covered by test_object_principal_enforces_check_and_grant. No existing authz enforcement for authenticated callers is weakened.

No files require special attention.

Important Files Changed

Filename	Overview
agentex/src/api/routes/agents.py	Adds `_has_resolvable_creator` helper and gates `check`/`grant` in `register_agent` behind it. Logic is correct and well-documented; both dict and attribute principal shapes are handled; the unauthenticated path correctly skips auth calls.
agentex/tests/unit/api/test_agents_authz.py	New `TestRegisterAgentOwnershipEnforcement` class covers all three paths: None/empty/no-matching-key principal skips check+grant; dict principal with user_id enforces; SimpleNamespace principal with user_id exercises the getattr branch and enforces — addressing the previously untested branch.
agentex/tests/integration/api/agents/test_agents_auth_api.py	Renames and updates the integration test to assert no authz calls on the unauthenticated register path, correctly reflecting the new behavior. Delete-gate assertions remain unchanged.

_{Reviews (2): Last reviewed commit: "test(authz): align register authz tests ..." | Re-trigger Greptile}

…ister /agents/register is whitelisted, so deployed pods self-register on startup with no principal context. #270 added an enforced check(create)/grant on that path, which forwarded a None principal to agentex-auth -> _validate_principal raised 422 -> agent pods stuck in CrashLoopBackOff on startup. The agent already exists and is owned from build time (register-build runs before deploy-time register), so the create check and ownership grant don't apply on the self-registration path. Skip them when no creator is resolvable; an authenticated caller is still enforced.

greptile-apps · 2026-06-09T16:54:37Z

+    if isinstance(principal_context, dict):
+        return bool(
+            principal_context.get("user_id")
+            or principal_context.get("service_account_id")
+        )
+    return bool(
+        getattr(principal_context, "user_id", None)
+        or getattr(principal_context, "service_account_id", None)
+    )


getattr branch is never exercised with a truthy result in tests

_has_resolvable_creator has two branches: a dict path (lines 64–68) and a getattr path (lines 69–72). The tests cover None → getattr → False and {"user_id": "u"} → dict → True, but there is no test exercising getattr(..., "user_id") → True (i.e., a non-dict object with a user_id attribute). Since AgentexAuthPrincipalContext = Any and the production path may return a Pydantic model or another non-dict object from the auth gateway, the branch that would actually enforce authz for service-account-authenticated callers in production is untested. A test passing a MagicMock with user_id = "sa-1" set would close this gap.

Prompt To Fix With AI

This is a comment left during a code review. Path: agentex/src/api/routes/agents.py Line: 64-72 Comment: **`getattr` branch is never exercised with a truthy result in tests** `_has_resolvable_creator` has two branches: a `dict` path (lines 64–68) and a `getattr` path (lines 69–72). The tests cover `None → getattr → False` and `{"user_id": "u"} → dict → True`, but there is no test exercising `getattr(..., "user_id") → True` (i.e., a non-dict object with a `user_id` attribute). Since `AgentexAuthPrincipalContext = Any` and the production path may return a Pydantic model or another non-dict object from the auth gateway, the branch that would actually enforce authz for service-account-authenticated callers in production is untested. A test passing a `MagicMock` with `user_id = "sa-1"` set would close this gap. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

The legacy-authz integration test mocks agentex-auth, so it asserted the None-principal check/grant the real service 422s on -- masking the deployed-pod crashloop. Align the register/create portion: the whitelisted self-registration path forwards no check/grant (the agent is already owned from build time); the delete portion (authenticated) is unchanged. Unit tests cover the route guard directly: no/empty/creatorless principal skips check+grant (still returns the agent, no 422), while a dict- or object-shaped principal with a user/service id enforces both.

## Summary - Restore legacy `/agents/register` compatibility by accepting `principal_context` in the register request body. - Use the request-state principal when present, otherwise fall back to the body principal for create-check and ownership grant. - Keep the #292 behavior where register skips authz when no resolvable principal is provided, avoiding the unauthenticated self-register crashloop. - Regenerate the Agentex OpenAPI spec. ## Validation - `uv run pytest agentex/tests/unit/api/test_agents_authz.py -q` - `uv run ruff check agentex/src/api/routes/agents.py agentex/src/api/schemas/agents.py agentex/tests/unit/api/test_agents_authz.py` - `uv run ruff format --check agentex/src/api/routes/agents.py agentex/src/api/schemas/agents.py agentex/tests/unit/api/test_agents_authz.py` - `make gen-openapi`

harvhan requested a review from a team as a code owner June 9, 2026 16:50

rpatel-scale approved these changes Jun 9, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

harvhan merged commit 63f89e7 into main Jun 9, 2026
30 checks passed

harvhan deleted the harvhan/fix-register-null-principal branch June 9, 2026 17:13

rpatel-scale mentioned this pull request Jun 18, 2026

fix(authz): grant legacy agent register principal #325

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(authz): skip create-check/grant on unauthenticated agent self-register#292

fix(authz): skip create-check/grant on unauthenticated agent self-register#292
harvhan merged 2 commits into
mainfrom
harvhan/fix-register-null-principal

harvhan commented Jun 9, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

greptile-apps Bot Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

harvhan commented Jun 9, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Why not the alternatives

Scope

Testing

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harvhan commented Jun 9, 2026 •

edited by greptile-apps Bot

Loading