fix(authz): mint agentex resource ownership on create under FGAC#291
Merged
Conversation
Two issues prevented resource ownership from being recorded when fine-grained access control is enforced: - /agents/register-build was matched by the /agents/register entry in WHITELISTED_ROUTES through a prefix (startswith) check, so it bypassed authentication. With no principal, the ownership grant/register for the newly built agent ran with a null principal and failed the create check. Use a boundary-aware match (exact, or a true sub-path under the route) so /agents/register stays whitelisted (pod self-registration) while /agents/register-build authenticates and carries the caller principal. - The _register_*_in_auth helpers resolved the creator via getattr(principal_context, "user_id"/"service_account_id"), but principal_context is an untyped dict, so both were always None and registration was skipped with "no creator resolvable" for agent, agent_api_key, and schedule. Read the fields from the dict (with an attribute-access fallback) so register_resource fires for the resource and its children.
Contributor
Author
|
Companion (authz-service side): https://github.com/scaleapi/agentex/pull/369 — coerces the dict principal before Spark routing and routes the create-gate to the tenant |
rpatel-scale
approved these changes
Jun 9, 2026
#271 made denied agent-route access collapse to 404 instead of 403 and updated the unit authz tests, but this events integration test still asserted 403 -- leaving it red on main and on any open PR. Align the expectation and name with the 404 convention. CI confirms the events route already returns 404 for a denied agent (the assertion failed with `404 == 403`); this only updates the test to match the intended behavior.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resource ownership was not being recorded on agentex resource creation when fine-grained access control is enforced. Two independent issues:
1.
register-buildbypassed authenticationis_whitelisted_routematched whitelisted entries with a plainstartswith, so the/agents/registerentry also matched/agents/register-build. The build-time registration endpoint therefore skipped authentication and ran with no principal context — its ownership grant/register executed with a null principal, and the create check failed (HTTP 422) under enforcement.Fix: boundary-aware match — a route matches only itself or a true sub-path (
route + "/")./agents/registerstays whitelisted (deployed pods self-register without a user principal);/agents/register-buildnow authenticates and carries the caller principal that becomes the owner.2. Creator never resolved from the principal context
_register_in_auth/_register_api_key_in_auth/_register_schedule_in_authread the creator viagetattr(principal_context, "user_id" / "service_account_id"), but the principal context is an untyped dict, so both were alwaysNone. Registration was skipped with "no creator resolvable", so the resource (and its children) were never written to the authorization graph.Fix: read the fields from the dict (with an attribute-access fallback), so
register_resourcefires for agent, agent_api_key, and schedule (covers both user and service-account creators).Testing
Verified end-to-end against a live authorization backend: the owner can see/read created resources, while a different user in the same account is correctly isolated when enforcement is on and falls back to account-level access when enforcement is off. The boundary-whitelist behavior was unit-checked across exact / sub-path / prefix-sibling cases.
Follow-up (not in this PR)
The root cause of #2 is that the principal context type is effectively untyped (
Any), so attribute access silently no-ops. A cleaner long-term fix is to make it a typed model so attribute access works everywhere (and the auth-middleware success log stops reportinguser_id=None). This PR keeps the change minimal and localized.Greptile Summary
This PR fixes two independent authorization bugs that prevented resource ownership from being recorded on creation under FGAC: a whitelist boundary check that accidentally exempted
/agents/register-buildfrom authentication, and a principal-context field-access bug (dict vs typed model) that caused every creator lookup to silently returnNone.middleware_utils.py): replaces barestartswithwithpath == route or path.startswith(route + "/")so/agents/registerno longer accidentally covers/agents/register-build.agents_use_case.py,agent_api_keys_use_case.py,schedule_service.py): readsuser_id/service_account_idvia.get()whenprincipal_contextis a dict, falling back togetattrfor any future typed model.test_events_authz_api.py): updates the expected response code for a denied-agent list request from 403 → 404, consistent with the convention established in feat(agents): collapse denied agent route access to 404 instead of 403 #271.Confidence Score: 5/5
Safe to merge — the two targeted fixes are correct, the whitelist logic is well-reasoned, and the dict-safe principal extraction pattern is applied consistently across all three registration sites.
The whitelist boundary fix correctly narrows what was an accidental over-match. The dict-safe lookup change is a straightforward type-check guard that matches how the principal context is actually structured. The test update is internally consistent with the 404-on-denial convention already present in the same file. No regressions are introduced.
No files require special attention beyond the minor dead-code redundancy in middleware_utils.py.
Important Files Changed
Sequence Diagram
sequenceDiagram participant Client participant Middleware as AuthMiddleware participant WL as is_whitelisted_route participant AuthGW as AuthGateway (/v1/authn) participant UseCase as AgentsUseCase._register_in_auth participant AuthSvc as AuthorizationService (Spark) Client->>Middleware: POST /agents/register-build Middleware->>WL: is_whitelisted_route("/agents/register-build") Note over WL: OLD: startswith("/agents/register") → True (bypassed auth!)<br/>NEW: startswith(route+"/") → False WL-->>Middleware: False (not whitelisted) Middleware->>AuthGW: verify headers AuthGW-->>Middleware: "principal_context {user_id, account_id}" Middleware->>UseCase: register_agent(...) UseCase->>UseCase: "isinstance(principal_context, dict)?<br/>OLD: getattr → None, skipped<br/>NEW: dict.get() → user-123" UseCase->>AuthSvc: "register_resource(agent, owner=user-123)" AuthSvc-->>UseCase: OK UseCase-->>Client: 200 agent created with ownershipComments Outside Diff (1)
agentex/src/api/middleware_utils.py, line 152-155 (link)user_id=Noneverify_auth_gatewaylogsuser_idandaccount_idviagetattr(principal_context, …, None)(lines 153–154), butprincipal_contextis a dict, sogetattralways yieldsNone. Every authenticated request will loguser_id=None, account_id=None, making the success log useless for debugging and correlation. The PR description mentions this as a follow-up, but it is worth flagging since it degrades observability immediately after this fix lands.Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (4): Last reviewed commit: "Merge branch 'main' into harvhan/fgac-re..." | Re-trigger Greptile