Skip to content

Define OBOConfig CRD schema for Entra OBO flow#5494

Merged
tgrunnagle merged 6 commits into
mainfrom
obo-impl-5
Jun 12, 2026
Merged

Define OBOConfig CRD schema for Entra OBO flow#5494
tgrunnagle merged 6 commits into
mainfrom
obo-impl-5

Conversation

@tgrunnagle

@tgrunnagle tgrunnagle commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Summary

The OBOConfig struct on MCPExternalAuthConfigSpec (the obo external-auth
type, for the Microsoft Entra On-Behalf-Of flow) was an empty placeholder
(type OBOConfig struct{}), so the OBO type had no config surface to read.
This populates OBOConfig with a real schema so the on-behalf-of flow has
something to configure. The schema is structurally valid in upstream (OSS)
builds but inert: an obo-typed config still surfaces
status.conditions[Valid] = False / Reason: EnterpriseRequired at reconcile,
because no OBO handler is registered in upstream builds. Admission validates
per-field shape (patterns and length/item bounds) via kubebuilder markers; all
presence and combination requirements are enforced by the registered handler at
reconcile, and the Go Validate() OBO arm defers to it.

  • Why: the OBO external-auth type needs a user-facing config surface for the
    Entra OBO flow. The struct was a deferred placeholder, so nothing could be
    configured and the schema only admitted obo: {}.
  • What: define the OBOConfig fields, regenerate deepcopy / both CRD
    manifests / the CRD API docs, and add an envtest suite that exercises the new
    admission-time validation through a real apiserver.

New OBOConfig fields:

  • tenantId — optional at the CRD level (the operator enforces presence);
    constrained to a GUID-or-domain pattern when set (well-known aliases like
    common are rejected, since an OBO confidential-client exchange must target a
    specific tenant).
  • authority — optional HTTPS base for sovereign / B2C / CIAM clouds; a path is
    permitted and is prefixed before the tenant segment.
  • clientId and clientSecretRef — optional at the CRD level (enforced by the
    operator per auth mode), left optional so certificate / workload-identity
    client auth can be added later without a breaking schema change.
  • audience and/or scopes — at least one is required; the operator enforces
    that at reconcile (not at admission), so the schema keeps admitting the empty
    obo: {} placeholder shipped in v0.29.3.
  • subjectTokenProviderName — optional; selects the OBO subject source (inbound
    end-user token vs. a named upstream provider's token).
  • cacheSkew — optional metav1.Duration for the token cache's expiry skew.

Field names and semantics track the shared OBO runtime wire contract. There is
deliberately no externalTokenHeaderName: the OBO subject is sourced from the
authenticated Identity, never from an inbound request header.

Large PR Justification

The PR exceeds 1000 lines because it is dominated by generated artifacts that
must land atomically with the type definition that produces them:

  • ~510 lines — regenerated CRD manifests. Adding the eight OBOConfig
    fields regenerates the spec.obo schema in both CRD versions (v1beta1 and
    v1alpha1) and in both copies the chart ships (files/crds/ and the Helm
    templates/).
  • ~55 lines — regenerated CRD API reference (docs/operator/crd-api.md).
  • ~17 lines — regenerated deepcopy (zz_generated.deepcopy.go).

That generated output (~580 lines) is mechanical and cannot be split from the
source without leaving the tree non-buildable / non-regenerable. The
hand-written change is a single logical unit — one CRD type (OBOConfig,
~185 lines) plus its envtest suite (~290 lines). Per the contributing
guidelines, generated code and test-only changes are accepted reasons for a
larger PR.

Type of change

  • New feature

Test plan

  • Unit tests (task operator-test) — the api/v1beta1 package, including
    the updated Validate() cases covering a fully populated OBOConfig and a
    minimal OBOConfig that passes the Go method (field checks are deferred to
    admission / reconcile).
  • Linting (task lint-fix) — clean for the changed files.
  • Manual testing (describe below)

Ran the new envtest integration suite (task operator-test-integration,
mcp-external-auth): 20 specs validating the kubebuilder schema through a real
apiserver — the empty obo: {} placeholder and partial configs are admitted,
while per-field patterns and bounds reject malformed values (non-HTTPS /
userinfo / query / fragment / trailing-slash authority, GUID/domain tenantId
incl. the 253-char cap, subjectTokenProviderName DNS-label, and the scopes
count/length caps). All pass.

API Compatibility

  • This PR does not break the v1beta1 API, OR the api-break-allowed label
    is applied and the migration guidance is described above.

This is additive: OBOConfig was previously an empty object and only new
optional/required-within-obo fields are introduced. Stored obo: {} objects
do not exist in practice (the type was inert), and the obo block is only
required when type is obo.

Changes

File Change
cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go Populate OBOConfig with the Entra OBO fields, kubebuilder markers, and the audience-or-scopes CEL rule; update doc comments and the Validate() OBO arm rationale.
cmd/thv-operator/api/v1beta1/zz_generated.deepcopy.go Regenerated deepcopy for the new OBOConfig fields.
cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types_test.go Update Validate() cases: a fully populated OBOConfig and a minimal block that passes Go validation (field checks deferred).
cmd/thv-operator/test-integration/mcp-external-auth/obo_validation_test.go New envtest suite (16 specs) exercising the CEL rule and markers via a real apiserver.
deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpexternalauthconfigs.yaml Regenerated CRD manifest.
deploy/charts/operator-crds/templates/toolhive.stacklok.dev_mcpexternalauthconfigs.yaml Regenerated CRD manifest.
docs/operator/crd-api.md Regenerated CRD API docs for OBOConfig.

Does this introduce a user-facing change?

Yes. The obo external-auth type now exposes a configurable schema
(tenantId, authority, clientId, clientSecretRef, audience, scopes,
subjectTokenProviderName, cacheSkew) with admission-time validation. In
upstream builds the type remains inert — an obo-typed MCPExternalAuthConfig
still reports Valid=False / Reason=EnterpriseRequired at reconcile because no
OBO handler is registered.

Special notes for reviewers

  • subjectTokenProviderName vs sibling subjectProviderName. Other configs
    in this file (TokenExchangeConfig, AWSStsConfig) name the equivalent field
    subjectProviderName. OBOConfig deliberately uses subjectTokenProviderName
    to track the downstream obo.MiddlewareParameters field name exactly — the
    divergence is intentional (documented in the struct doc), not an oversight.
  • No required fields and no cross-field CEL rule. spec.obo shipped as an
    empty {} placeholder in v0.29.3, so the schema must keep admitting {} and
    any subset of fields to preserve that round-trip (the CRD schema-compatibility
    check enforces this). The Go Validate() OBO arm likewise does not check
    OBOConfig fields. All presence/combination requirements — a tenant, a
    client-auth credential, at least one of audience/scopes, a non-negative
    cacheSkew — are owned by the registered OBO handler at reconcile, which
    reports violations as Valid=False / Reason=InvalidConfig. Upstream builds
    stay inert (Reason=EnterpriseRequired).
  • Admission validates per-field shape only. Patterns and length/item bounds
    (tenantId GUID/domain + 253 cap, authority HTTPS with no userinfo/query/
    fragment/trailing-slash, subjectTokenProviderName DNS-label, scopes caps)
    validate values that are present. The authority pattern is intentionally
    stricter than the downstream validateHTTPSURL (which accepts http-loopback
    and a trailing slash) — rejecting those, and userinfo, at admission is the
    safe direction.

Generated with Claude Code

tgrunnagle and others added 2 commits June 10, 2026 14:48
The mcpv1beta1.OBOConfig struct was an empty placeholder deferred to a
follow-up RFC. The enterprise OBO overlay needs a user-facing config
surface to read, so populate OBOConfig with the fields the Microsoft
Entra OBO flow requires.

Field names and semantics track the shared obo.MiddlewareParameters
wire contract (not the upstream TokenExchangeConfig): tenantId (+ optional
authority) maps to the contract's tokenUrl, clientSecretRef to
clientSecretEnvVar, audience/scopes collapse via ExchangeTarget(), and the
subject source is selected by subjectTokenProviderName. There is
deliberately no externalTokenHeaderName -- the OBO subject comes from the
authenticated Identity, not a request header.

The schema is structurally valid upstream but inert: an OBO-typed config
still surfaces Valid=False / Reason=EnterpriseRequired at reconcile because
no OBO handler is registered in upstream builds. Field-level validation
lives in kubebuilder markers plus a CEL rule (admission) and the enterprise
handler (reconcile); the Go Validate() arm continues to defer.

Regenerated deepcopy, CRD manifests, and CRD API docs. Added an envtest
suite exercising the new admission-time validation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address code review and a field-by-field check against the downstream
enterprise OBO consumers (the obo.MiddlewareParameters contract and the
entra exchanger/cache/runtime that consume it):

- Tighten tenantId to a GUID-or-domain pattern mirroring the exchanger's
  validateTenant, so a tenantId admitted by the CRD is one the runtime can
  consume. The previous loose pattern admitted aliases like "common" that
  the exchanger rejects, creating an admission/reconcile gap.
- Correct the authority field: the exchanger deliberately allows a path
  (sovereign / B2C / CIAM endpoints use different token paths), so keep the
  path-permitting pattern and fix the doc comment that wrongly claimed "no
  path".
- Require a non-blank audience or scope at admission (CEL trim()/exists),
  mirroring ExchangeTarget()'s trimming so a whitespace-only value is
  rejected up front rather than only at reconcile. Bound scopes (MaxItems +
  per-item length) to keep the CEL rule within the apiserver cost budget.
- Make clientId and clientSecretRef optional at the CRD level, enforced by
  the operator per auth mode, so certificate / workload-identity client auth
  can be added later without a breaking schema change.

Regenerated CRD manifests and API docs; extended the envtest CEL suite
(16 specs) to cover the tightened rules through a real apiserver.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the size/L Large PR: 600-999 lines changed label Jun 10, 2026
The OBOConfig field-mapping doc comment carried a bare "#1581" issue
reference that does not resolve in this repository and is noise in the
generated CRD descriptions. Describe the operator's OBO handler in prose
instead. Regenerated CRD manifests and API docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.47%. Comparing base (ad127f7) to head (d2d5e8d).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5494      +/-   ##
==========================================
- Coverage   69.48%   69.47%   -0.02%     
==========================================
  Files         638      638              
  Lines       65042    65042              
==========================================
- Hits        45192    45185       -7     
- Misses      16529    16537       +8     
+ Partials     3321     3320       -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 10, 2026
The CRD schema-compatibility check failed: spec.obo shipped as an empty
placeholder ({}) in v0.29.3, whose schema admitted any stored object with
obo: {}. Marking the new tenantId field required (NoNewRequiredFields) and
adding a CEL rule that rejects {} both narrow that released schema and would
invalidate already-stored objects.

Make every OBOConfig field optional and drop the audience-or-scopes CEL rule
so the schema keeps admitting obo: {} and any subset of fields. Presence and
combination requirements (a tenant, a client-auth credential, at least one of
audience or scopes) are enforced by the registered OBO handler at reconcile,
reported as Valid=False / Reason=InvalidConfig. Per-field patterns and bounds
remain and validate only values that are present.

Regenerated CRD manifests and API docs. Reworked the envtest suite to assert
the empty placeholder and partial configs are admitted while the per-field
patterns still reject malformed values.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 11, 2026

@tgrunnagle tgrunnagle left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-agent review — OBOConfig CRD schema

Reviewed at commit 1e47a82 (after "Keep OBOConfig schema backward compatible"), with 6 specialist reviewers plus a downstream cross-reference against stacklok-enterprise-platform. Recommendation: COMMENT — no blocking issues; the schema design is sound and backward-compatible.

✅ Downstream cross-reference (the headline check)

Every field the downstream obo.MiddlewareParameters contract + Entra exchanger need has a source in this CRD: tenantId (+ authority) → tokenUrl, clientId, clientSecretRefclientSecretEnvVar, audience, scopes, subjectTokenProviderName, cacheSkewcacheSkewSeconds. The tenantId pattern + 253 cap match the exchanger's validateTenant exactly, and alias rejection (common/organizations/consumers) matches. No missing field, no orphan field.

Trade-off worth noting: making everything optional (the right call for the v0.29.3 {} round-trip) moves the contract's presence requirements (tokenUrl/clientId/clientSecretEnvVar/non-empty exchange target) entirely to the reconcile-time handler — which is still a skeleton (ErrNotYetImplemented) downstream. See F11 below.

Findings

# Finding Severity
F2 authority pattern admits embedded userinfo (@) — host confusion at the credential trust boundary MEDIUM
F11 Field binding unverified vs compiled downstream accessors (handler/converter are skeletons) MEDIUM
F4 Pattern reject specs use bare ShouldNot(Succeed()) — can pass for the wrong reason MEDIUM
F5 No authority query/fragment rejection test MEDIUM
F6 cacheSkew negative unguarded; comment promises rejection by skeleton code LOW
F10 authority comment overstates parity (CRD is stricter than the runtime) LOW
F7 Missing boundary tests (tenant 253, scopes 20/256, isolated valid subject) LOW
F8 subjectTokenProviderName vs sibling subjectProviderName inconsistency LOW
F9 Doc-comment density / cross-repo drift risk LOW

Inline comments below cover F2 and F4–F10.

F11 (cross-repo, not line-specific): the operator handler (enterprise/.../pkg/auth/obo/operator/handler.go) and vMCP converter are skeletons returning ErrNotYetImplemented, so no compiled downstream code reads obo.TenantID / obo.CacheSkew / obo.ClientSecretRef yet. The field names/types are verified only against the contract + issue #1578. Recommend adding a compile-time reference test on the #1581 handler PR that reads every OBOConfig field (including the cacheSkew Duration→seconds conversion and negative rejection) so the binding is locked when the submodule bumps.

What's solid

Backward-compat pivot is correct (preserves the v0.29.3 {} placeholder round-trip); no ReDoS (RE2 is linear); tenantId is injection-safe (no path metacharacters); secret boundary correct (only the env-var name travels in the contract); deepcopy correctly deep-copies the new pointer/slice/Duration fields; both CRD manifests + crd-api.md regenerated consistently; Validate() correctly preserves the ErrEnterpriseRequired reconcile deferral.

🤖 Generated with Claude Code

Comment thread cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go Outdated
Comment thread cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go Outdated
Comment thread cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go Outdated
Comment thread cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go
Comment thread cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go
Comment thread cmd/thv-operator/test-integration/mcp-external-auth/obo_validation_test.go Outdated
Comment thread cmd/thv-operator/test-integration/mcp-external-auth/obo_validation_test.go Outdated
Comment thread cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go
tgrunnagle and others added 2 commits June 11, 2026 08:46
Addresses #5494 review comments:
- MEDIUM authority (3397159713): the pattern admitted embedded userinfo,
  so https://login.microsoftonline.com@attacker.example was accepted while
  the real host (per RFC 3986) is attacker.example — host confusion at the
  credential trust boundary. Exclude "@" from the pattern.
- LOW authority doc (3397159761): note the CRD is intentionally stricter
  than the runtime validateHTTPSURL (which accepts http loopback and a
  trailing slash), rather than implying exact parity.
- LOW cacheSkew doc (3397159770): the negative-skew rejection is an
  enterprise-handler concern, not enforced at admission or upstream; soften
  the comment so it does not promise rejection that no current build does.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses #5494 review comments:
- MEDIUM test quality (3397159791): pattern-reject specs asserted only
  ShouldNot(Succeed()), so a rejection for an unrelated reason would keep
  them green. Add a shared rejectsWithField helper and assert each error
  names the offending field (tenantId/authority/subjectTokenProviderName).
- MEDIUM coverage (3397159798): add authority reject specs for a query
  string, a fragment, and embedded userinfo (the last locks the userinfo
  fix from the sibling commit).
- LOW coverage (3397159806): add schema-bound specs (tenantId >253 chars,
  >20 scopes, a 257-char scope item) and an isolated accept for a valid
  lowercase subjectTokenProviderName.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 11, 2026
@tgrunnagle

Copy link
Copy Markdown
Contributor Author

Re: F11 (field binding vs compiled downstream accessors) — Agreed, and deferred by design. At this stage the schema lands first; the operator handler and vMCP converter that will read OBOConfig are still skeletons, so there is no compiled code here to bind against. The field names/types are verified against the runtime wire contract and the issue. The compile-time reference test you suggest — reading every OBOConfig field, including the cacheSkew Duration→seconds conversion and negative rejection — belongs on the downstream handler PR, where it can actually compile against the accessors and lock the binding when the submodule bumps. Tracking it as an acceptance item there.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 11, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 11, 2026
@github-actions github-actions Bot dismissed their stale review June 11, 2026 16:30

Large PR justification has been provided. Thank you!

@github-actions

Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@tgrunnagle tgrunnagle marked this pull request as ready for review June 11, 2026 16:33
@tgrunnagle tgrunnagle requested a review from blkt as a code owner June 11, 2026 16:33
@tgrunnagle tgrunnagle merged commit eb503b5 into main Jun 12, 2026
47 checks passed
@tgrunnagle tgrunnagle deleted the obo-impl-5 branch June 12, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants