Skip to content

Add opt-in RBAC management UI via rbacManagement flag#4865

Open
dimitri-nicolo wants to merge 13 commits into
tigera:masterfrom
dimitri-nicolo:dimitri-PMREQ-824-rbac-mgmt
Open

Add opt-in RBAC management UI via rbacManagement flag#4865
dimitri-nicolo wants to merge 13 commits into
tigera:masterfrom
dimitri-nicolo:dimitri-PMREQ-824-rbac-mgmt

Conversation

@dimitri-nicolo
Copy link
Copy Markdown
Contributor

Description

Type: New feature (Calico Enterprise only). Addresses PMREQ-824.

Adds an opt-in RBAC management UI feature, toggled by a new Manager.spec.rbacManagement.enabled flag. When enabled, the operator renders the additional permissions and network access the UI needs to let an administrator manage role/group assignments from the Manager: maintaining a catalog of managed ClusterRoles, binding IdP groups to roles, and discovering groups from an external LDAP directory. The flag is zero-tenant only and defaults to false, so existing clusters are unaffected until an operator explicitly turns it on.

This should be merged because it's the operator-side enablement for the RBAC management UI; without it the UI's backend has no RBAC or egress to function, and gating keeps the feature entirely dormant (and its escalation-capable permissions un-granted) on clusters that don't use it.

What the flag turns on, by layer

  • API (api/v1/manager_types.go) — new RBACManagement spec struct + nil-safe Manager.RBACManagementEnabled() helper; regenerated deepcopy and CRD. Disabling after enabling does not garbage-collect already-rendered RBAC objects — documented on the CRD surface as one-way.
  • Render
    • calico-kube-controllers: enables the rbacsync controller stanza and its RBAC only when the flag is set (controller is dormant otherwise).
    • calico-manager role: adds the RBAC management UI permissions (full CRUD on managed RBAC objects, escalation-prevention coverage for tier/resource roles, Dex login-cache and IdP-LDAP/IdP-groups access) and opens egress to LDAP ports 389/636.
    • tigera-network-admin (apiserver): grants the bind/escalate-capable rule needed to assign managed roles via the UI — also gated on the flag so non-RBAC-UI clusters never receive these verbs.
    • Shared escalation-prevention rules live in a new pkg/render/rbac_management.go so the manager and kube-controllers roles stay aligned.

Why the installation and apiserver controllers read the Manager CR

The flag conceptually belongs to the Manager, but two of the resources it gates are not rendered by the manager controller:

  • The rbacsync controller runs inside calico-kube-controllers, which is rendered by the installation controller.
  • The tigera-network-admin ClusterRole is rendered by the apiserver controller.

So each of those controllers has to read Manager.spec.rbacManagement.enabled itself to make its gating decision. Both do so via a new nil-safe helper, utils.GetZeroTenantManagerOrNil, and pass the resulting *Manager (or nil) into their render config:

  • Installation controller → KubeControllersConfiguration.Manager, consumed as cfg.Manager.RBACManagementEnabled() to decide whether to append the rbacsync controller + its RBAC.
  • Apiserver controller → APIServerConfiguration.RBACManagementEnabled, used to decide whether to append the bind/escalate rule to tigera-network-admin.

The *Manager (rather than a pre-computed bool) is threaded through where convenient so the nil handling stays in the one helper — RBACManagementEnabled() returns false for a nil receiver, matching the opt-in, fail-safe-off intent.

Why "zero-tenant": in multi-tenant management clusters each tenant has its own namespaced Manager, fetched tenant-aware by the manager controller's existing GetManager. The installation and apiserver controllers are cluster-scoped and have no tenant namespace in scope, and rbacManagement is a zero-tenant-only feature — so the new helper deliberately does a cluster-scoped Get ({Name: "tigera-secure"}, no namespace). Its name is a guardrail: callers needing a tenant-scoped Manager must keep using GetManager; non-manager controllers reading zero-tenant-only flags use GetZeroTenantManagerOrNil. On a multi-tenant cluster the cluster-scoped Manager typically doesn't exist, so the helper returns nil and the feature stays off — the intended fail-safe behavior.

Why the new watches: both controllers also add a watch on the Manager CR. Without it, toggling rbacManagement.enabled would not re-trigger an installation/apiserver reconcile, so the rbacsync controller and the network-admin rule wouldn't appear or disappear until some unrelated event happened to re-run those controllers.

Components affected: apiserver (tigera-network-admin), manager (calico-manager role + network policy), kube-controllers (rbacsync), Manager CRD, installation & apiserver controllers.

Release Note

Added an opt-in `Manager.spec.rbacManagement.enabled` flag that enables the RBAC management UI. When enabled, the operator grants the additional RBAC and LDAP egress the UI requires; it defaults to disabled and is supported in zero-tenant management clusters only. Disabling the flag does not remove RBAC objects already created while it was enabled.

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files
  • If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

  • Milestone set according to targeted release.
  • Appropriate labels:
    • kind/bug if this is a bugfix.
    • kind/enhancement if this is a a new feature.
    • enterprise if this PR applies to Calico Enterprise only.

@marvin-tigera marvin-tigera added this to the v1.43.0 milestone May 26, 2026
@dimitri-nicolo dimitri-nicolo changed the title Dimitri pmreq 824 rbac mgmt manager: add opt-in RBAC management UI via rbacManagement flag May 26, 2026
@dimitri-nicolo dimitri-nicolo changed the title manager: add opt-in RBAC management UI via rbacManagement flag Add opt-in RBAC management UI via rbacManagement flag May 26, 2026
@dimitri-nicolo dimitri-nicolo force-pushed the dimitri-PMREQ-824-rbac-mgmt branch from 504568c to 8239694 Compare May 27, 2026 00:13
@dimitri-nicolo dimitri-nicolo marked this pull request as ready for review May 27, 2026 00:33
@dimitri-nicolo dimitri-nicolo requested a review from a team as a code owner May 27, 2026 00:33
Comment thread api/v1/manager_types.go Outdated
Comment thread api/v1/manager_types.go Outdated
Comment thread pkg/controller/utils/utils.go Outdated
// that need a tenant-scoped Manager (the manager controller itself) must use
// GetManager from pkg/controller/manager. Non-manager controllers reading
// zero-tenant-only flags (e.g. spec.rbacManagement) belong here.
func GetZeroTenantManagerOrNil(ctx context.Context, c client.Client) (*operatorv1.Manager, error) {
Copy link
Copy Markdown
Member

@rene-dekker rene-dekker May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we move the GetManager func from manager_controller to the utils package and re-use it? There is no reason that apiserver should throw an error if manager is not present. We should however only fetch it when enterprise CRDs exist and we are not running in cloud.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you moved the func to here, but I also meant to delete this func entirely as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code is more clear if we delete this method, remove any notion of "zero tenant" from the code including comments (it's not yargon that is used today) and use the multitenant booleans explicitly at the caller. I think that will be more understandable to the average reader.

Comment thread api/v1/manager_types.go Outdated
// +optional
ManagerDeployment *ManagerDeployment `json:"managerDeployment,omitempty"`

// RBAC configures the RBAC management UI feature. Only honored in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should leak cloud (internal) implications into the API comments. Our users don't need to be aware/think about those.

Comment thread api/v1/manager_types.go Outdated
// Disabled.
// +optional
// +kubebuilder:validation:Enum=Enabled;Disabled
Mode RBACMode `json:"mode,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of the following field names?

  • Management: Enabled|Disabled
  • UI: Enabled|Disabled

or values: Visible | Hidden

@dimitri-nicolo dimitri-nicolo force-pushed the dimitri-PMREQ-824-rbac-mgmt branch from b9493a6 to 94159c6 Compare May 28, 2026 21:53

// RBAC management UI: when enabled on the Manager CR, the rbacsync
// controller runs inside calico-kube-controllers to reconcile the
// catalog of managed ClusterRoles (per-tier + 32 fine-grained + 6
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This number is not likely to hold up over time, best to keep the comment succinct.

@rene-dekker
Copy link
Copy Markdown
Member

rene-dekker commented May 28, 2026

General comment: the AI generated comments in this code are a bit excessive. An extreme example may be the permissions. The func explains in comments twice which permissions are added + the code to add them. They are making the code less readable. And now we need to also keep the comments up to date when we add more. The operator code is not super complicated, I'd much rather have comments be only used if it adds critical information you couldn't get from simply reading the code (or have your ai read it for you).

Comment thread pkg/render/manager.go Outdated
},
// Escalation coverage for webhook-mod resource roles (create/patch on
// Secrets to wire up webhook auth).
rbacv1.PolicyRule{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we be more specific with secrets and configmaps and bind them to the namespaces we actually need?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The named-resource access moved to a new Role + RoleBinding in calico-system, where tigera-idp-groups and tigera-idp-ldap-config actually live. The broad create is bound to that one namespace now.

Dropped the tigera-known-oidc-users rule (it lives in tigera-elasticsearch and is already granted by logstorage.go), and tightened the activation gate to !Tenant.MultiTenant() since the IdP resources are pinned to calico-system and the RBAC UI is zero-tenant-only anyway.

Is that what you meant by "bind them to the namespaces we actually need" — or were you pointing at something narrower?

Comment thread pkg/render/manager.go
dimitri-nicolo and others added 9 commits May 29, 2026 15:05
Adds an opt-in Manager.spec.rbacManagement.enabled flag that the rendering
layer reads to gate the RBAC management UI feature surface. Zero-tenant
only; defaults to false. Existing managed RBAC objects are not
garbage-collected on disable — that's noted on the public CRD surface so
operators know the toggle is one-way for already-created objects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ssions

Renders the RBAC management UI feature surface behind
Manager.spec.rbacManagement.enabled:

  - kube-controllers: enable the rbacsync controller stanza and its RBAC
    only when the flag is set (new rbac_management.go holds the shared
    rule helpers).
  - manager: gate the ui-apis container's RBAC management rules; add the
    IdP-groups LDAP integration rules (get/list/watch/update/delete on the
    tigera-idp-ldap-config Secret and tigera-idp-groups ConfigMap, scoped
    by name) and open egress to LDAP ports 389/636 only when enabled.
  - apiserver: extend tigera-network-admin so a network admin can manage
    ClusterRoles/Roles and (Cluster)RoleBindings via the UI, including the
    bind+escalate verbs needed to grant managed roles without holding every
    rule those roles carry.

Tests cover both the enabled and disabled rendering paths.
The kubecontrollers renderer needs Manager.spec.rbacManagement.enabled to
decide whether to enable the rbacsync controller stanza. The installation
controller is the one that builds kubecontrollers, so it reads the Manager
CR here (optional, nil-safe) and passes it through.

Adds a watch on Manager CR so toggling rbacManagement re-runs the
installation reconcile, and a GetZeroTenantManagerOrNil helper in utils
that does a cluster-scoped Get with no namespace. The name encodes the
contract: it's safe for the zero-tenant rbacManagement readout, but
multi-tenant callers (the manager controller) must keep using their own
tenant-aware GetManager.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dimitri-nicolo and others added 4 commits May 29, 2026 15:05
Move cluster-wide create on configmaps/secrets out of calico-manager-role
into a namespaced Role bound only in the manager namespace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inline the IsNotFound→nil handling at the two callers (apiserver and
installation controllers) so they each manage their own absence policy
explicitly. Strip "zero tenant" from comments where it was load-bearing
in nothing but the prose.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dimitri-nicolo dimitri-nicolo force-pushed the dimitri-PMREQ-824-rbac-mgmt branch from 8833288 to 2b749da Compare May 29, 2026 22:19
// management UI's directory cache and the cascading cleanup of managed
// bindings when a group is removed).
{
APIGroups: []string{""},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be in a role.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants