Define OBOConfig CRD schema for Entra OBO flow by tgrunnagle · Pull Request #5494 · stacklok/toolhive

tgrunnagle · 2026-06-10T22:55:23Z

Summary

The OBOConfig struct on MCPExternalAuthConfigSpec (the obo external-auth
type, for the Microsoft Entra On-Behalf-Of flow) was an empty placeholder
(type OBOConfig struct{}), so the OBO type had no config surface to read.
This populates OBOConfig with a real schema so the on-behalf-of flow has
something to configure. The schema is structurally valid in upstream (OSS)
builds but inert: an obo-typed config still surfaces
status.conditions[Valid] = False / Reason: EnterpriseRequired at reconcile,
because no OBO handler is registered in upstream builds. Admission validates
per-field shape (patterns and length/item bounds) via kubebuilder markers; all
presence and combination requirements are enforced by the registered handler at
reconcile, and the Go Validate() OBO arm defers to it.

Why: the OBO external-auth type needs a user-facing config surface for the
Entra OBO flow. The struct was a deferred placeholder, so nothing could be
configured and the schema only admitted obo: {}.
What: define the OBOConfig fields, regenerate deepcopy / both CRD
manifests / the CRD API docs, and add an envtest suite that exercises the new
admission-time validation through a real apiserver.

New OBOConfig fields:

tenantId — optional at the CRD level (the operator enforces presence);
constrained to a GUID-or-domain pattern when set (well-known aliases like
common are rejected, since an OBO confidential-client exchange must target a
specific tenant).
authority — optional HTTPS base for sovereign / B2C / CIAM clouds; a path is
permitted and is prefixed before the tenant segment.
clientId and clientSecretRef — optional at the CRD level (enforced by the
operator per auth mode), left optional so certificate / workload-identity
client auth can be added later without a breaking schema change.
audience and/or scopes — at least one is required; the operator enforces
that at reconcile (not at admission), so the schema keeps admitting the empty
obo: {} placeholder shipped in v0.29.3.
subjectTokenProviderName — optional; selects the OBO subject source (inbound
end-user token vs. a named upstream provider's token).
cacheSkew — optional metav1.Duration for the token cache's expiry skew.

Field names and semantics track the shared OBO runtime wire contract. There is
deliberately no externalTokenHeaderName: the OBO subject is sourced from the
authenticated Identity, never from an inbound request header.

Large PR Justification

The PR exceeds 1000 lines because it is dominated by generated artifacts that
must land atomically with the type definition that produces them:

~510 lines — regenerated CRD manifests. Adding the eight OBOConfig
fields regenerates the spec.obo schema in both CRD versions (v1beta1 and
v1alpha1) and in both copies the chart ships (files/crds/ and the Helm
templates/).
~55 lines — regenerated CRD API reference (docs/operator/crd-api.md).
~17 lines — regenerated deepcopy (zz_generated.deepcopy.go).

That generated output (~580 lines) is mechanical and cannot be split from the
source without leaving the tree non-buildable / non-regenerable. The
hand-written change is a single logical unit — one CRD type (OBOConfig,
~185 lines) plus its envtest suite (~290 lines). Per the contributing
guidelines, generated code and test-only changes are accepted reasons for a
larger PR.

Type of change

New feature

Test plan

Unit tests (task operator-test) — the api/v1beta1 package, including
the updated Validate() cases covering a fully populated OBOConfig and a
minimal OBOConfig that passes the Go method (field checks are deferred to
admission / reconcile).
Linting (task lint-fix) — clean for the changed files.
Manual testing (describe below)

Ran the new envtest integration suite (task operator-test-integration,
mcp-external-auth): 20 specs validating the kubebuilder schema through a real
apiserver — the empty obo: {} placeholder and partial configs are admitted,
while per-field patterns and bounds reject malformed values (non-HTTPS /
userinfo / query / fragment / trailing-slash authority, GUID/domain tenantId
incl. the 253-char cap, subjectTokenProviderName DNS-label, and the scopes
count/length caps). All pass.

API Compatibility

This PR does not break the v1beta1 API, OR the api-break-allowed label
is applied and the migration guidance is described above.

This is additive: OBOConfig was previously an empty object and only new
optional/required-within-obo fields are introduced. Stored obo: {} objects
do not exist in practice (the type was inert), and the obo block is only
required when type is obo.

Changes

File	Change
`cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.go`	Populate `OBOConfig` with the Entra OBO fields, kubebuilder markers, and the audience-or-scopes CEL rule; update doc comments and the `Validate()` OBO arm rationale.
`cmd/thv-operator/api/v1beta1/zz_generated.deepcopy.go`	Regenerated deepcopy for the new `OBOConfig` fields.
`cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types_test.go`	Update `Validate()` cases: a fully populated `OBOConfig` and a minimal block that passes Go validation (field checks deferred).
`cmd/thv-operator/test-integration/mcp-external-auth/obo_validation_test.go`	New envtest suite (16 specs) exercising the CEL rule and markers via a real apiserver.
`deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpexternalauthconfigs.yaml`	Regenerated CRD manifest.
`deploy/charts/operator-crds/templates/toolhive.stacklok.dev_mcpexternalauthconfigs.yaml`	Regenerated CRD manifest.
`docs/operator/crd-api.md`	Regenerated CRD API docs for `OBOConfig`.

Does this introduce a user-facing change?

Yes. The obo external-auth type now exposes a configurable schema
(tenantId, authority, clientId, clientSecretRef, audience, scopes,
subjectTokenProviderName, cacheSkew) with admission-time validation. In
upstream builds the type remains inert — an obo-typed MCPExternalAuthConfig
still reports Valid=False / Reason=EnterpriseRequired at reconcile because no
OBO handler is registered.

Special notes for reviewers

subjectTokenProviderName vs sibling subjectProviderName. Other configs
in this file (TokenExchangeConfig, AWSStsConfig) name the equivalent field
subjectProviderName. OBOConfig deliberately uses subjectTokenProviderName
to track the downstream obo.MiddlewareParameters field name exactly — the
divergence is intentional (documented in the struct doc), not an oversight.
No required fields and no cross-field CEL rule. spec.obo shipped as an
empty {} placeholder in v0.29.3, so the schema must keep admitting {} and
any subset of fields to preserve that round-trip (the CRD schema-compatibility
check enforces this). The Go Validate() OBO arm likewise does not check
OBOConfig fields. All presence/combination requirements — a tenant, a
client-auth credential, at least one of audience/scopes, a non-negative
cacheSkew — are owned by the registered OBO handler at reconcile, which
reports violations as Valid=False / Reason=InvalidConfig. Upstream builds
stay inert (Reason=EnterpriseRequired).
Admission validates per-field shape only. Patterns and length/item bounds
(tenantId GUID/domain + 253 cap, authority HTTPS with no userinfo/query/
fragment/trailing-slash, subjectTokenProviderName DNS-label, scopes caps)
validate values that are present. The authority pattern is intentionally
stricter than the downstream validateHTTPSURL (which accepts http-loopback
and a trailing slash) — rejecting those, and userinfo, at admission is the
safe direction.

Generated with Claude Code

The mcpv1beta1.OBOConfig struct was an empty placeholder deferred to a follow-up RFC. The enterprise OBO overlay needs a user-facing config surface to read, so populate OBOConfig with the fields the Microsoft Entra OBO flow requires. Field names and semantics track the shared obo.MiddlewareParameters wire contract (not the upstream TokenExchangeConfig): tenantId (+ optional authority) maps to the contract's tokenUrl, clientSecretRef to clientSecretEnvVar, audience/scopes collapse via ExchangeTarget(), and the subject source is selected by subjectTokenProviderName. There is deliberately no externalTokenHeaderName -- the OBO subject comes from the authenticated Identity, not a request header. The schema is structurally valid upstream but inert: an OBO-typed config still surfaces Valid=False / Reason=EnterpriseRequired at reconcile because no OBO handler is registered in upstream builds. Field-level validation lives in kubebuilder markers plus a CEL rule (admission) and the enterprise handler (reconcile); the Go Validate() arm continues to defer. Regenerated deepcopy, CRD manifests, and CRD API docs. Added an envtest suite exercising the new admission-time validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Address code review and a field-by-field check against the downstream enterprise OBO consumers (the obo.MiddlewareParameters contract and the entra exchanger/cache/runtime that consume it): - Tighten tenantId to a GUID-or-domain pattern mirroring the exchanger's validateTenant, so a tenantId admitted by the CRD is one the runtime can consume. The previous loose pattern admitted aliases like "common" that the exchanger rejects, creating an admission/reconcile gap. - Correct the authority field: the exchanger deliberately allows a path (sovereign / B2C / CIAM endpoints use different token paths), so keep the path-permitting pattern and fix the doc comment that wrongly claimed "no path". - Require a non-blank audience or scope at admission (CEL trim()/exists), mirroring ExchangeTarget()'s trimming so a whitespace-only value is rejected up front rather than only at reconcile. Bound scopes (MaxItems + per-item length) to keep the CEL rule within the apiserver cost budget. - Make clientId and clientSecretRef optional at the CRD level, enforced by the operator per auth mode, so certificate / workload-identity client auth can be added later without a breaking schema change. Regenerated CRD manifests and API docs; extended the envtest CEL suite (16 specs) to cover the tightened rules through a real apiserver. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The OBOConfig field-mapping doc comment carried a bare "#1581" issue reference that does not resolve in this repository and is noise in the generated CRD descriptions. Describe the operator's OBO handler in prose instead. Regenerated CRD manifests and API docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov · 2026-06-10T23:03:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.47%. Comparing base (ad127f7) to head (d2d5e8d).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5494      +/-   ##
==========================================
- Coverage   69.48%   69.47%   -0.02%     
==========================================
  Files         638      638              
  Lines       65042    65042              
==========================================
- Hits        45192    45185       -7     
- Misses      16529    16537       +8     
+ Partials     3321     3320       -1

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The CRD schema-compatibility check failed: spec.obo shipped as an empty placeholder ({}) in v0.29.3, whose schema admitted any stored object with obo: {}. Marking the new tenantId field required (NoNewRequiredFields) and adding a CEL rule that rejects {} both narrow that released schema and would invalidate already-stored objects. Make every OBOConfig field optional and drop the audience-or-scopes CEL rule so the schema keeps admitting obo: {} and any subset of fields. Presence and combination requirements (a tenant, a client-auth credential, at least one of audience or scopes) are enforced by the registered OBO handler at reconcile, reported as Valid=False / Reason=InvalidConfig. Per-field patterns and bounds remain and validate only values that are present. Regenerated CRD manifests and API docs. Reworked the envtest suite to assert the empty placeholder and partial configs are admitted while the per-field patterns still reject malformed values. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tgrunnagle

Multi-agent review — OBOConfig CRD schema

Reviewed at commit 1e47a82 (after "Keep OBOConfig schema backward compatible"), with 6 specialist reviewers plus a downstream cross-reference against stacklok-enterprise-platform. Recommendation: COMMENT — no blocking issues; the schema design is sound and backward-compatible.

✅ Downstream cross-reference (the headline check)

Every field the downstream obo.MiddlewareParameters contract + Entra exchanger need has a source in this CRD: tenantId (+ authority) → tokenUrl, clientId, clientSecretRef → clientSecretEnvVar, audience, scopes, subjectTokenProviderName, cacheSkew → cacheSkewSeconds. The tenantId pattern + 253 cap match the exchanger's validateTenant exactly, and alias rejection (common/organizations/consumers) matches. No missing field, no orphan field.

Trade-off worth noting: making everything optional (the right call for the v0.29.3 {} round-trip) moves the contract's presence requirements (tokenUrl/clientId/clientSecretEnvVar/non-empty exchange target) entirely to the reconcile-time handler — which is still a skeleton (ErrNotYetImplemented) downstream. See F11 below.

Findings

#	Finding	Severity
F2	`authority` pattern admits embedded userinfo (`@`) — host confusion at the credential trust boundary	MEDIUM
F11	Field binding unverified vs compiled downstream accessors (handler/converter are skeletons)	MEDIUM
F4	Pattern reject specs use bare `ShouldNot(Succeed())` — can pass for the wrong reason	MEDIUM
F5	No `authority` query/fragment rejection test	MEDIUM
F6	`cacheSkew` negative unguarded; comment promises rejection by skeleton code	LOW
F10	`authority` comment overstates parity (CRD is stricter than the runtime)	LOW
F7	Missing boundary tests (tenant 253, scopes 20/256, isolated valid subject)	LOW
F8	`subjectTokenProviderName` vs sibling `subjectProviderName` inconsistency	LOW
F9	Doc-comment density / cross-repo drift risk	LOW

Inline comments below cover F2 and F4–F10.

F11 (cross-repo, not line-specific): the operator handler (enterprise/.../pkg/auth/obo/operator/handler.go) and vMCP converter are skeletons returning ErrNotYetImplemented, so no compiled downstream code reads obo.TenantID / obo.CacheSkew / obo.ClientSecretRef yet. The field names/types are verified only against the contract + issue #1578. Recommend adding a compile-time reference test on the #1581 handler PR that reads every OBOConfig field (including the cacheSkew Duration→seconds conversion and negative rejection) so the binding is locked when the submodule bumps.

What's solid

Backward-compat pivot is correct (preserves the v0.29.3 {} placeholder round-trip); no ReDoS (RE2 is linear); tenantId is injection-safe (no path metacharacters); secret boundary correct (only the env-var name travels in the contract); deepcopy correctly deep-copies the new pointer/slice/Duration fields; both CRD manifests + crd-api.md regenerated consistently; Validate() correctly preserves the ErrEnterpriseRequired reconcile deferral.

🤖 Generated with Claude Code

Addresses #5494 review comments: - MEDIUM authority (3397159713): the pattern admitted embedded userinfo, so https://login.microsoftonline.com@attacker.example was accepted while the real host (per RFC 3986) is attacker.example — host confusion at the credential trust boundary. Exclude "@" from the pattern. - LOW authority doc (3397159761): note the CRD is intentionally stricter than the runtime validateHTTPSURL (which accepts http loopback and a trailing slash), rather than implying exact parity. - LOW cacheSkew doc (3397159770): the negative-skew rejection is an enterprise-handler concern, not enforced at admission or upstream; soften the comment so it does not promise rejection that no current build does. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Addresses #5494 review comments: - MEDIUM test quality (3397159791): pattern-reject specs asserted only ShouldNot(Succeed()), so a rejection for an unrelated reason would keep them green. Add a shared rejectsWithField helper and assert each error names the offending field (tenantId/authority/subjectTokenProviderName). - MEDIUM coverage (3397159798): add authority reject specs for a query string, a fragment, and embedded userinfo (the last locks the userinfo fix from the sibling commit). - LOW coverage (3397159806): add schema-bound specs (tenantId >253 chars, >20 scopes, a 257-char scope item) and an isolated accept for a valid lowercase subjectTokenProviderName. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tgrunnagle · 2026-06-11T16:03:56Z

Re: F11 (field binding vs compiled downstream accessors) — Agreed, and deferred by design. At this stage the schema lands first; the operator handler and vMCP converter that will read OBOConfig are still skeletons, so there is no compiled code here to bind against. The field names/types are verified against the runtime wire contract and the issue. The compile-time reference test you suggest — reading every OBOConfig field, including the cacheSkew Duration→seconds conversion and negative rejection — belongs on the downstream handler PR, where it can actually compile against the accessors and lock the binding when the submodule bumps. Tracking it as an acceptance item there.

github-actions

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.

This review will be automatically dismissed once you add the justification section.

Large PR justification has been provided. Thank you!

github-actions · 2026-06-11T16:30:51Z

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

tgrunnagle and others added 2 commits June 10, 2026 14:48

github-actions Bot added the size/L Large PR: 600-999 lines changed label Jun 10, 2026

github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 10, 2026

github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 11, 2026

tgrunnagle commented Jun 11, 2026

View reviewed changes

tgrunnagle and others added 2 commits June 11, 2026 08:46

github-actions Bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 11, 2026

github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/L Large PR: 600-999 lines changed labels Jun 11, 2026

github-actions Bot previously requested changes Jun 11, 2026

View reviewed changes

github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Jun 11, 2026

tgrunnagle marked this pull request as ready for review June 11, 2026 16:33

tgrunnagle requested review from ChrisJBurns, JAORMX, jerm-dro, jhrozek, rdimitrov and reyortiz3 as code owners June 11, 2026 16:33

tgrunnagle requested a review from blkt as a code owner June 11, 2026 16:33

jhrozek approved these changes Jun 11, 2026

View reviewed changes

tgrunnagle merged commit eb503b5 into main Jun 12, 2026
47 checks passed

tgrunnagle deleted the obo-impl-5 branch June 12, 2026 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define OBOConfig CRD schema for Entra OBO flow#5494

Define OBOConfig CRD schema for Entra OBO flow#5494
tgrunnagle merged 6 commits into
mainfrom
obo-impl-5

tgrunnagle commented Jun 10, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

tgrunnagle left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgrunnagle commented Jun 11, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tgrunnagle commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Large PR Justification

Type of change

Test plan

API Compatibility

Changes

Does this introduce a user-facing change?

Special notes for reviewers

Uh oh!

codecov Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tgrunnagle left a comment

Choose a reason for hiding this comment

Multi-agent review — OBOConfig CRD schema

✅ Downstream cross-reference (the headline check)

Findings

What's solid

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgrunnagle commented Jun 11, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Large PR Detected

How to unblock this PR:

Alternative:

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tgrunnagle commented Jun 10, 2026 •

edited

Loading

codecov Bot commented Jun 10, 2026 •

edited

Loading