Define OBOConfig CRD schema for Entra OBO flow#5494
Conversation
The mcpv1beta1.OBOConfig struct was an empty placeholder deferred to a follow-up RFC. The enterprise OBO overlay needs a user-facing config surface to read, so populate OBOConfig with the fields the Microsoft Entra OBO flow requires. Field names and semantics track the shared obo.MiddlewareParameters wire contract (not the upstream TokenExchangeConfig): tenantId (+ optional authority) maps to the contract's tokenUrl, clientSecretRef to clientSecretEnvVar, audience/scopes collapse via ExchangeTarget(), and the subject source is selected by subjectTokenProviderName. There is deliberately no externalTokenHeaderName -- the OBO subject comes from the authenticated Identity, not a request header. The schema is structurally valid upstream but inert: an OBO-typed config still surfaces Valid=False / Reason=EnterpriseRequired at reconcile because no OBO handler is registered in upstream builds. Field-level validation lives in kubebuilder markers plus a CEL rule (admission) and the enterprise handler (reconcile); the Go Validate() arm continues to defer. Regenerated deepcopy, CRD manifests, and CRD API docs. Added an envtest suite exercising the new admission-time validation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address code review and a field-by-field check against the downstream enterprise OBO consumers (the obo.MiddlewareParameters contract and the entra exchanger/cache/runtime that consume it): - Tighten tenantId to a GUID-or-domain pattern mirroring the exchanger's validateTenant, so a tenantId admitted by the CRD is one the runtime can consume. The previous loose pattern admitted aliases like "common" that the exchanger rejects, creating an admission/reconcile gap. - Correct the authority field: the exchanger deliberately allows a path (sovereign / B2C / CIAM endpoints use different token paths), so keep the path-permitting pattern and fix the doc comment that wrongly claimed "no path". - Require a non-blank audience or scope at admission (CEL trim()/exists), mirroring ExchangeTarget()'s trimming so a whitespace-only value is rejected up front rather than only at reconcile. Bound scopes (MaxItems + per-item length) to keep the CEL rule within the apiserver cost budget. - Make clientId and clientSecretRef optional at the CRD level, enforced by the operator per auth mode, so certificate / workload-identity client auth can be added later without a breaking schema change. Regenerated CRD manifests and API docs; extended the envtest CEL suite (16 specs) to cover the tightened rules through a real apiserver. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The OBOConfig field-mapping doc comment carried a bare "#1581" issue reference that does not resolve in this repository and is noise in the generated CRD descriptions. Describe the operator's OBO handler in prose instead. Regenerated CRD manifests and API docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5494 +/- ##
==========================================
- Coverage 69.48% 69.47% -0.02%
==========================================
Files 638 638
Lines 65042 65042
==========================================
- Hits 45192 45185 -7
- Misses 16529 16537 +8
+ Partials 3321 3320 -1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
The CRD schema-compatibility check failed: spec.obo shipped as an empty
placeholder ({}) in v0.29.3, whose schema admitted any stored object with
obo: {}. Marking the new tenantId field required (NoNewRequiredFields) and
adding a CEL rule that rejects {} both narrow that released schema and would
invalidate already-stored objects.
Make every OBOConfig field optional and drop the audience-or-scopes CEL rule
so the schema keeps admitting obo: {} and any subset of fields. Presence and
combination requirements (a tenant, a client-auth credential, at least one of
audience or scopes) are enforced by the registered OBO handler at reconcile,
reported as Valid=False / Reason=InvalidConfig. Per-field patterns and bounds
remain and validate only values that are present.
Regenerated CRD manifests and API docs. Reworked the envtest suite to assert
the empty placeholder and partial configs are admitted while the per-field
patterns still reject malformed values.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tgrunnagle
left a comment
There was a problem hiding this comment.
Multi-agent review — OBOConfig CRD schema
Reviewed at commit 1e47a82 (after "Keep OBOConfig schema backward compatible"), with 6 specialist reviewers plus a downstream cross-reference against stacklok-enterprise-platform. Recommendation: COMMENT — no blocking issues; the schema design is sound and backward-compatible.
✅ Downstream cross-reference (the headline check)
Every field the downstream obo.MiddlewareParameters contract + Entra exchanger need has a source in this CRD: tenantId (+ authority) → tokenUrl, clientId, clientSecretRef → clientSecretEnvVar, audience, scopes, subjectTokenProviderName, cacheSkew → cacheSkewSeconds. The tenantId pattern + 253 cap match the exchanger's validateTenant exactly, and alias rejection (common/organizations/consumers) matches. No missing field, no orphan field.
Trade-off worth noting: making everything optional (the right call for the v0.29.3 {} round-trip) moves the contract's presence requirements (tokenUrl/clientId/clientSecretEnvVar/non-empty exchange target) entirely to the reconcile-time handler — which is still a skeleton (ErrNotYetImplemented) downstream. See F11 below.
Findings
| # | Finding | Severity |
|---|---|---|
| F2 | authority pattern admits embedded userinfo (@) — host confusion at the credential trust boundary |
MEDIUM |
| F11 | Field binding unverified vs compiled downstream accessors (handler/converter are skeletons) | MEDIUM |
| F4 | Pattern reject specs use bare ShouldNot(Succeed()) — can pass for the wrong reason |
MEDIUM |
| F5 | No authority query/fragment rejection test |
MEDIUM |
| F6 | cacheSkew negative unguarded; comment promises rejection by skeleton code |
LOW |
| F10 | authority comment overstates parity (CRD is stricter than the runtime) |
LOW |
| F7 | Missing boundary tests (tenant 253, scopes 20/256, isolated valid subject) | LOW |
| F8 | subjectTokenProviderName vs sibling subjectProviderName inconsistency |
LOW |
| F9 | Doc-comment density / cross-repo drift risk | LOW |
Inline comments below cover F2 and F4–F10.
F11 (cross-repo, not line-specific): the operator handler (enterprise/.../pkg/auth/obo/operator/handler.go) and vMCP converter are skeletons returning ErrNotYetImplemented, so no compiled downstream code reads obo.TenantID / obo.CacheSkew / obo.ClientSecretRef yet. The field names/types are verified only against the contract + issue #1578. Recommend adding a compile-time reference test on the #1581 handler PR that reads every OBOConfig field (including the cacheSkew Duration→seconds conversion and negative rejection) so the binding is locked when the submodule bumps.
What's solid
Backward-compat pivot is correct (preserves the v0.29.3 {} placeholder round-trip); no ReDoS (RE2 is linear); tenantId is injection-safe (no path metacharacters); secret boundary correct (only the env-var name travels in the contract); deepcopy correctly deep-copies the new pointer/slice/Duration fields; both CRD manifests + crd-api.md regenerated consistently; Validate() correctly preserves the ErrEnterpriseRequired reconcile deferral.
🤖 Generated with Claude Code
Addresses #5494 review comments: - MEDIUM authority (3397159713): the pattern admitted embedded userinfo, so https://login.microsoftonline.com@attacker.example was accepted while the real host (per RFC 3986) is attacker.example — host confusion at the credential trust boundary. Exclude "@" from the pattern. - LOW authority doc (3397159761): note the CRD is intentionally stricter than the runtime validateHTTPSURL (which accepts http loopback and a trailing slash), rather than implying exact parity. - LOW cacheSkew doc (3397159770): the negative-skew rejection is an enterprise-handler concern, not enforced at admission or upstream; soften the comment so it does not promise rejection that no current build does. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses #5494 review comments: - MEDIUM test quality (3397159791): pattern-reject specs asserted only ShouldNot(Succeed()), so a rejection for an unrelated reason would keep them green. Add a shared rejectsWithField helper and assert each error names the offending field (tenantId/authority/subjectTokenProviderName). - MEDIUM coverage (3397159798): add authority reject specs for a query string, a fragment, and embedded userinfo (the last locks the userinfo fix from the sibling commit). - LOW coverage (3397159806): add schema-bound specs (tenantId >253 chars, >20 scopes, a 257-char scope item) and an isolated accept for a valid lowercase subjectTokenProviderName. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Re: F11 (field binding vs compiled downstream accessors) — Agreed, and deferred by design. At this stage the schema lands first; the operator handler and vMCP converter that will read |
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Large PR justification has been provided. Thank you!
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
Summary
The
OBOConfigstruct onMCPExternalAuthConfigSpec(theoboexternal-authtype, for the Microsoft Entra On-Behalf-Of flow) was an empty placeholder
(
type OBOConfig struct{}), so the OBO type had no config surface to read.This populates
OBOConfigwith a real schema so the on-behalf-of flow hassomething to configure. The schema is structurally valid in upstream (OSS)
builds but inert: an
obo-typed config still surfacesstatus.conditions[Valid] = False/Reason: EnterpriseRequiredat reconcile,because no OBO handler is registered in upstream builds. Admission validates
per-field shape (patterns and length/item bounds) via kubebuilder markers; all
presence and combination requirements are enforced by the registered handler at
reconcile, and the Go
Validate()OBO arm defers to it.Entra OBO flow. The struct was a deferred placeholder, so nothing could be
configured and the schema only admitted
obo: {}.OBOConfigfields, regenerate deepcopy / both CRDmanifests / the CRD API docs, and add an envtest suite that exercises the new
admission-time validation through a real apiserver.
New
OBOConfigfields:tenantId— optional at the CRD level (the operator enforces presence);constrained to a GUID-or-domain pattern when set (well-known aliases like
commonare rejected, since an OBO confidential-client exchange must target aspecific tenant).
authority— optional HTTPS base for sovereign / B2C / CIAM clouds; a path ispermitted and is prefixed before the tenant segment.
clientIdandclientSecretRef— optional at the CRD level (enforced by theoperator per auth mode), left optional so certificate / workload-identity
client auth can be added later without a breaking schema change.
audienceand/orscopes— at least one is required; the operator enforcesthat at reconcile (not at admission), so the schema keeps admitting the empty
obo: {}placeholder shipped in v0.29.3.subjectTokenProviderName— optional; selects the OBO subject source (inboundend-user token vs. a named upstream provider's token).
cacheSkew— optionalmetav1.Durationfor the token cache's expiry skew.Field names and semantics track the shared OBO runtime wire contract. There is
deliberately no
externalTokenHeaderName: the OBO subject is sourced from theauthenticated
Identity, never from an inbound request header.Large PR Justification
The PR exceeds 1000 lines because it is dominated by generated artifacts that
must land atomically with the type definition that produces them:
OBOConfigfields regenerates the
spec.oboschema in both CRD versions (v1beta1andv1alpha1) and in both copies the chart ships (files/crds/and the Helmtemplates/).docs/operator/crd-api.md).zz_generated.deepcopy.go).That generated output (~580 lines) is mechanical and cannot be split from the
source without leaving the tree non-buildable / non-regenerable. The
hand-written change is a single logical unit — one CRD type (
OBOConfig,~185 lines) plus its envtest suite (~290 lines). Per the contributing
guidelines, generated code and test-only changes are accepted reasons for a
larger PR.
Type of change
Test plan
task operator-test) — theapi/v1beta1package, includingthe updated
Validate()cases covering a fully populatedOBOConfigand aminimal
OBOConfigthat passes the Go method (field checks are deferred toadmission / reconcile).
task lint-fix) — clean for the changed files.Ran the new envtest integration suite (
task operator-test-integration,mcp-external-auth): 20 specs validating the kubebuilder schema through a realapiserver — the empty
obo: {}placeholder and partial configs are admitted,while per-field patterns and bounds reject malformed values (non-HTTPS /
userinfo / query / fragment / trailing-slash authority, GUID/domain
tenantIdincl. the 253-char cap,
subjectTokenProviderNameDNS-label, and the scopescount/length caps). All pass.
API Compatibility
v1beta1API, OR theapi-break-allowedlabelis applied and the migration guidance is described above.
This is additive:
OBOConfigwas previously an empty object and only newoptional/required-within-
obofields are introduced. Storedobo: {}objectsdo not exist in practice (the type was inert), and the
oboblock is onlyrequired when
typeisobo.Changes
cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types.goOBOConfigwith the Entra OBO fields, kubebuilder markers, and the audience-or-scopes CEL rule; update doc comments and theValidate()OBO arm rationale.cmd/thv-operator/api/v1beta1/zz_generated.deepcopy.goOBOConfigfields.cmd/thv-operator/api/v1beta1/mcpexternalauthconfig_types_test.goValidate()cases: a fully populatedOBOConfigand a minimal block that passes Go validation (field checks deferred).cmd/thv-operator/test-integration/mcp-external-auth/obo_validation_test.godeploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_mcpexternalauthconfigs.yamldeploy/charts/operator-crds/templates/toolhive.stacklok.dev_mcpexternalauthconfigs.yamldocs/operator/crd-api.mdOBOConfig.Does this introduce a user-facing change?
Yes. The
oboexternal-auth type now exposes a configurable schema(
tenantId,authority,clientId,clientSecretRef,audience,scopes,subjectTokenProviderName,cacheSkew) with admission-time validation. Inupstream builds the type remains inert — an
obo-typedMCPExternalAuthConfigstill reports
Valid=False/Reason=EnterpriseRequiredat reconcile because noOBO handler is registered.
Special notes for reviewers
subjectTokenProviderNamevs siblingsubjectProviderName. Other configsin this file (
TokenExchangeConfig,AWSStsConfig) name the equivalent fieldsubjectProviderName.OBOConfigdeliberately usessubjectTokenProviderNameto track the downstream
obo.MiddlewareParametersfield name exactly — thedivergence is intentional (documented in the struct doc), not an oversight.
spec.oboshipped as anempty
{}placeholder in v0.29.3, so the schema must keep admitting{}andany subset of fields to preserve that round-trip (the CRD schema-compatibility
check enforces this). The Go
Validate()OBO arm likewise does not checkOBOConfigfields. All presence/combination requirements — a tenant, aclient-auth credential, at least one of audience/scopes, a non-negative
cacheSkew— are owned by the registered OBO handler at reconcile, whichreports violations as
Valid=False/Reason=InvalidConfig. Upstream buildsstay inert (
Reason=EnterpriseRequired).(tenantId GUID/domain + 253 cap, authority HTTPS with no userinfo/query/
fragment/trailing-slash, subjectTokenProviderName DNS-label, scopes caps)
validate values that are present. The authority pattern is intentionally
stricter than the downstream
validateHTTPSURL(which accepts http-loopbackand a trailing slash) — rejecting those, and userinfo, at admission is the
safe direction.
Generated with Claude Code