Skip to content

fix(authz): mint agentex resource ownership on create under FGAC#291

Merged
harvhan merged 4 commits into
mainfrom
harvhan/fgac-register-ownership
Jun 9, 2026
Merged

fix(authz): mint agentex resource ownership on create under FGAC#291
harvhan merged 4 commits into
mainfrom
harvhan/fgac-register-ownership

Conversation

@harvhan

@harvhan harvhan commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Resource ownership was not being recorded on agentex resource creation when fine-grained access control is enforced. Two independent issues:

1. register-build bypassed authentication

is_whitelisted_route matched whitelisted entries with a plain startswith, so the /agents/register entry also matched /agents/register-build. The build-time registration endpoint therefore skipped authentication and ran with no principal context — its ownership grant/register executed with a null principal, and the create check failed (HTTP 422) under enforcement.

Fix: boundary-aware match — a route matches only itself or a true sub-path (route + "/"). /agents/register stays whitelisted (deployed pods self-register without a user principal); /agents/register-build now authenticates and carries the caller principal that becomes the owner.

2. Creator never resolved from the principal context

_register_in_auth / _register_api_key_in_auth / _register_schedule_in_auth read the creator via getattr(principal_context, "user_id" / "service_account_id"), but the principal context is an untyped dict, so both were always None. Registration was skipped with "no creator resolvable", so the resource (and its children) were never written to the authorization graph.

Fix: read the fields from the dict (with an attribute-access fallback), so register_resource fires for agent, agent_api_key, and schedule (covers both user and service-account creators).

Testing

Verified end-to-end against a live authorization backend: the owner can see/read created resources, while a different user in the same account is correctly isolated when enforcement is on and falls back to account-level access when enforcement is off. The boundary-whitelist behavior was unit-checked across exact / sub-path / prefix-sibling cases.

Follow-up (not in this PR)

The root cause of #2 is that the principal context type is effectively untyped (Any), so attribute access silently no-ops. A cleaner long-term fix is to make it a typed model so attribute access works everywhere (and the auth-middleware success log stops reporting user_id=None). This PR keeps the change minimal and localized.

Greptile Summary

This PR fixes two independent authorization bugs that prevented resource ownership from being recorded on creation under FGAC: a whitelist boundary check that accidentally exempted /agents/register-build from authentication, and a principal-context field-access bug (dict vs typed model) that caused every creator lookup to silently return None.

  • Boundary-aware whitelist (middleware_utils.py): replaces bare startswith with path == route or path.startswith(route + "/") so /agents/register no longer accidentally covers /agents/register-build.
  • Dict-safe creator lookup (agents_use_case.py, agent_api_keys_use_case.py, schedule_service.py): reads user_id/service_account_id via .get() when principal_context is a dict, falling back to getattr for any future typed model.
  • Test alignment (test_events_authz_api.py): updates the expected response code for a denied-agent list request from 403 → 404, consistent with the convention established in feat(agents): collapse denied agent route access to 404 instead of 403 #271.

Confidence Score: 5/5

Safe to merge — the two targeted fixes are correct, the whitelist logic is well-reasoned, and the dict-safe principal extraction pattern is applied consistently across all three registration sites.

The whitelist boundary fix correctly narrows what was an accidental over-match. The dict-safe lookup change is a straightforward type-check guard that matches how the principal context is actually structured. The test update is internally consistent with the 404-on-denial convention already present in the same file. No regressions are introduced.

No files require special attention beyond the minor dead-code redundancy in middleware_utils.py.

Important Files Changed

Filename Overview
agentex/src/api/middleware_utils.py Boundary-aware whitelist fix; path == route inside any() is redundant dead code (set lookup already handles exact matches), but the core logic is correct.
agentex/src/domain/use_cases/agents_use_case.py Dict-safe principal_context field extraction; isinstance guard correctly handles the actual dict shape from /v1/authn.
agentex/src/domain/use_cases/agent_api_keys_use_case.py Same dict-safe fix as agents_use_case.py; consistent pattern applied correctly.
agentex/src/domain/services/schedule_service.py Same dict-safe fix as the use-case files; schedule registration now correctly resolves the creator identity.
agentex/tests/integration/api/events/test_events_authz_api.py Test updated to expect 404 instead of 403 on denied agent-list access, consistent with the established convention in #271.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Middleware as AuthMiddleware
    participant WL as is_whitelisted_route
    participant AuthGW as AuthGateway (/v1/authn)
    participant UseCase as AgentsUseCase._register_in_auth
    participant AuthSvc as AuthorizationService (Spark)

    Client->>Middleware: POST /agents/register-build
    Middleware->>WL: is_whitelisted_route("/agents/register-build")
    Note over WL: OLD: startswith("/agents/register") → True (bypassed auth!)<br/>NEW: startswith(route+"/") → False
    WL-->>Middleware: False (not whitelisted)
    Middleware->>AuthGW: verify headers
    AuthGW-->>Middleware: "principal_context {user_id, account_id}"
    Middleware->>UseCase: register_agent(...)
    UseCase->>UseCase: "isinstance(principal_context, dict)?<br/>OLD: getattr → None, skipped<br/>NEW: dict.get() → user-123"
    UseCase->>AuthSvc: "register_resource(agent, owner=user-123)"
    AuthSvc-->>UseCase: OK
    UseCase-->>Client: 200 agent created with ownership
Loading

Comments Outside Diff (1)

  1. agentex/src/api/middleware_utils.py, line 152-155 (link)

    P2 Authentication success log still reports user_id=None

    verify_auth_gateway logs user_id and account_id via getattr(principal_context, …, None) (lines 153–154), but principal_context is a dict, so getattr always yields None. Every authenticated request will log user_id=None, account_id=None, making the success log useless for debugging and correlation. The PR description mentions this as a follow-up, but it is worth flagging since it degrades observability immediately after this fix lands.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: agentex/src/api/middleware_utils.py
    Line: 152-155
    
    Comment:
    **Authentication success log still reports `user_id=None`**
    
    `verify_auth_gateway` logs `user_id` and `account_id` via `getattr(principal_context, …, None)` (lines 153–154), but `principal_context` is a dict, so `getattr` always yields `None`. Every authenticated request will log `user_id=None, account_id=None`, making the success log useless for debugging and correlation. The PR description mentions this as a follow-up, but it is worth flagging since it degrades observability immediately after this fix lands.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Cursor Fix in Claude Code Fix in Codex

Fix All in Cursor Fix All in Claude Code Fix All in Codex

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
agentex/src/api/middleware_utils.py:50-52
The `path == route` branch inside the `any()` generator is dead code. Because `path in whitelisted_routes` is evaluated first (O(1) set lookup), control only reaches the `any()` when `path` is not in the set — which means `path == route` will be `False` for every element in `whitelisted_routes`. The `path == route` arm can never contribute a `True` that wasn't already caught above. Removing it makes the intent clearer and slightly faster.

```suggestion
    return path in whitelisted_routes or any(
        path.startswith(route + "/") for route in whitelisted_routes
    )
```

Reviews (4): Last reviewed commit: "Merge branch 'main' into harvhan/fgac-re..." | Re-trigger Greptile

Two issues prevented resource ownership from being recorded when
fine-grained access control is enforced:

- /agents/register-build was matched by the /agents/register entry in
  WHITELISTED_ROUTES through a prefix (startswith) check, so it bypassed
  authentication. With no principal, the ownership grant/register for the
  newly built agent ran with a null principal and failed the create check.
  Use a boundary-aware match (exact, or a true sub-path under the route)
  so /agents/register stays whitelisted (pod self-registration) while
  /agents/register-build authenticates and carries the caller principal.

- The _register_*_in_auth helpers resolved the creator via
  getattr(principal_context, "user_id"/"service_account_id"), but
  principal_context is an untyped dict, so both were always None and
  registration was skipped with "no creator resolvable" for agent,
  agent_api_key, and schedule. Read the fields from the dict (with an
  attribute-access fallback) so register_resource fires for the resource
  and its children.
@harvhan harvhan requested a review from a team as a code owner June 9, 2026 08:00
@harvhan

harvhan commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Companion (authz-service side): https://github.com/scaleapi/agentex/pull/369 — coerces the dict principal before Spark routing and routes the create-gate to the tenant can_create check. Both are needed for end-to-end ownership under enforcement.

rpatel-scale and others added 3 commits June 9, 2026 10:34
#271 made denied agent-route access collapse to 404 instead of 403 and
updated the unit authz tests, but this events integration test still
asserted 403 -- leaving it red on main and on any open PR. Align the
expectation and name with the 404 convention. CI confirms the events
route already returns 404 for a denied agent (the assertion failed with
`404 == 403`); this only updates the test to match the intended behavior.
@harvhan harvhan merged commit 567f957 into main Jun 9, 2026
30 checks passed
@harvhan harvhan deleted the harvhan/fgac-register-ownership branch June 9, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants