Add opt-in RBAC management UI via rbacManagement flag#4865
Add opt-in RBAC management UI via rbacManagement flag#4865dimitri-nicolo wants to merge 13 commits into
Conversation
504568c to
8239694
Compare
| // that need a tenant-scoped Manager (the manager controller itself) must use | ||
| // GetManager from pkg/controller/manager. Non-manager controllers reading | ||
| // zero-tenant-only flags (e.g. spec.rbacManagement) belong here. | ||
| func GetZeroTenantManagerOrNil(ctx context.Context, c client.Client) (*operatorv1.Manager, error) { |
There was a problem hiding this comment.
Can't we move the GetManager func from manager_controller to the utils package and re-use it? There is no reason that apiserver should throw an error if manager is not present. We should however only fetch it when enterprise CRDs exist and we are not running in cloud.
There was a problem hiding this comment.
I see that you moved the func to here, but I also meant to delete this func entirely as well.
There was a problem hiding this comment.
I think the code is more clear if we delete this method, remove any notion of "zero tenant" from the code including comments (it's not yargon that is used today) and use the multitenant booleans explicitly at the caller. I think that will be more understandable to the average reader.
| // +optional | ||
| ManagerDeployment *ManagerDeployment `json:"managerDeployment,omitempty"` | ||
|
|
||
| // RBAC configures the RBAC management UI feature. Only honored in |
There was a problem hiding this comment.
I don't think we should leak cloud (internal) implications into the API comments. Our users don't need to be aware/think about those.
| // Disabled. | ||
| // +optional | ||
| // +kubebuilder:validation:Enum=Enabled;Disabled | ||
| Mode RBACMode `json:"mode,omitempty"` |
There was a problem hiding this comment.
What do you think of the following field names?
- Management: Enabled|Disabled
- UI: Enabled|Disabled
or values: Visible | Hidden
b9493a6 to
94159c6
Compare
|
|
||
| // RBAC management UI: when enabled on the Manager CR, the rbacsync | ||
| // controller runs inside calico-kube-controllers to reconcile the | ||
| // catalog of managed ClusterRoles (per-tier + 32 fine-grained + 6 |
There was a problem hiding this comment.
nit: This number is not likely to hold up over time, best to keep the comment succinct.
|
General comment: the AI generated comments in this code are a bit excessive. An extreme example may be the permissions. The func explains in comments twice which permissions are added + the code to add them. They are making the code less readable. And now we need to also keep the comments up to date when we add more. The operator code is not super complicated, I'd much rather have comments be only used if it adds critical information you couldn't get from simply reading the code (or have your ai read it for you). |
| }, | ||
| // Escalation coverage for webhook-mod resource roles (create/patch on | ||
| // Secrets to wire up webhook auth). | ||
| rbacv1.PolicyRule{ |
There was a problem hiding this comment.
Could we be more specific with secrets and configmaps and bind them to the namespaces we actually need?
There was a problem hiding this comment.
The named-resource access moved to a new Role + RoleBinding in calico-system, where tigera-idp-groups and tigera-idp-ldap-config actually live. The broad create is bound to that one namespace now.
Dropped the tigera-known-oidc-users rule (it lives in tigera-elasticsearch and is already granted by logstorage.go), and tightened the activation gate to !Tenant.MultiTenant() since the IdP resources are pinned to calico-system and the RBAC UI is zero-tenant-only anyway.
Is that what you meant by "bind them to the namespaces we actually need" — or were you pointing at something narrower?
Adds an opt-in Manager.spec.rbacManagement.enabled flag that the rendering layer reads to gate the RBAC management UI feature surface. Zero-tenant only; defaults to false. Existing managed RBAC objects are not garbage-collected on disable — that's noted on the public CRD surface so operators know the toggle is one-way for already-created objects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ssions
Renders the RBAC management UI feature surface behind
Manager.spec.rbacManagement.enabled:
- kube-controllers: enable the rbacsync controller stanza and its RBAC
only when the flag is set (new rbac_management.go holds the shared
rule helpers).
- manager: gate the ui-apis container's RBAC management rules; add the
IdP-groups LDAP integration rules (get/list/watch/update/delete on the
tigera-idp-ldap-config Secret and tigera-idp-groups ConfigMap, scoped
by name) and open egress to LDAP ports 389/636 only when enabled.
- apiserver: extend tigera-network-admin so a network admin can manage
ClusterRoles/Roles and (Cluster)RoleBindings via the UI, including the
bind+escalate verbs needed to grant managed roles without holding every
rule those roles carry.
Tests cover both the enabled and disabled rendering paths.
The kubecontrollers renderer needs Manager.spec.rbacManagement.enabled to decide whether to enable the rbacsync controller stanza. The installation controller is the one that builds kubecontrollers, so it reads the Manager CR here (optional, nil-safe) and passes it through. Adds a watch on Manager CR so toggling rbacManagement re-runs the installation reconcile, and a GetZeroTenantManagerOrNil helper in utils that does a cluster-scoped Get with no namespace. The name encodes the contract: it's safe for the zero-tenant rbacManagement readout, but multi-tenant callers (the manager controller) must keep using their own tenant-aware GetManager. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move cluster-wide create on configmaps/secrets out of calico-manager-role into a namespaced Role bound only in the manager namespace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inline the IsNotFound→nil handling at the two callers (apiserver and installation controllers) so they each manage their own absence policy explicitly. Strip "zero tenant" from comments where it was load-bearing in nothing but the prose. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8833288 to
2b749da
Compare
| // management UI's directory cache and the cascading cleanup of managed | ||
| // bindings when a group is removed). | ||
| { | ||
| APIGroups: []string{""}, |
There was a problem hiding this comment.
This one should be in a role.
Description
Type: New feature (Calico Enterprise only). Addresses PMREQ-824.
Adds an opt-in RBAC management UI feature, toggled by a new
Manager.spec.rbacManagement.enabledflag. When enabled, the operator renders the additional permissions and network access the UI needs to let an administrator manage role/group assignments from the Manager: maintaining a catalog of managedClusterRoles, binding IdP groups to roles, and discovering groups from an external LDAP directory. The flag is zero-tenant only and defaults tofalse, so existing clusters are unaffected until an operator explicitly turns it on.This should be merged because it's the operator-side enablement for the RBAC management UI; without it the UI's backend has no RBAC or egress to function, and gating keeps the feature entirely dormant (and its escalation-capable permissions un-granted) on clusters that don't use it.
What the flag turns on, by layer
api/v1/manager_types.go) — newRBACManagementspec struct + nil-safeManager.RBACManagementEnabled()helper; regenerated deepcopy and CRD. Disabling after enabling does not garbage-collect already-rendered RBAC objects — documented on the CRD surface as one-way.calico-kube-controllers: enables therbacsynccontroller stanza and its RBAC only when the flag is set (controller is dormant otherwise).calico-managerrole: adds the RBAC management UI permissions (full CRUD on managed RBAC objects, escalation-prevention coverage for tier/resource roles, Dex login-cache and IdP-LDAP/IdP-groups access) and opens egress to LDAP ports 389/636.tigera-network-admin(apiserver): grants thebind/escalate-capable rule needed to assign managed roles via the UI — also gated on the flag so non-RBAC-UI clusters never receive these verbs.pkg/render/rbac_management.goso the manager and kube-controllers roles stay aligned.Why the installation and apiserver controllers read the Manager CR
The flag conceptually belongs to the Manager, but two of the resources it gates are not rendered by the manager controller:
rbacsynccontroller runs insidecalico-kube-controllers, which is rendered by the installation controller.tigera-network-adminClusterRole is rendered by the apiserver controller.So each of those controllers has to read
Manager.spec.rbacManagement.enableditself to make its gating decision. Both do so via a new nil-safe helper,utils.GetZeroTenantManagerOrNil, and pass the resulting*Manager(ornil) into their render config:KubeControllersConfiguration.Manager, consumed ascfg.Manager.RBACManagementEnabled()to decide whether to append therbacsynccontroller + its RBAC.APIServerConfiguration.RBACManagementEnabled, used to decide whether to append thebind/escalaterule totigera-network-admin.The
*Manager(rather than a pre-computed bool) is threaded through where convenient so the nil handling stays in the one helper —RBACManagementEnabled()returnsfalsefor a nil receiver, matching the opt-in, fail-safe-off intent.Why "zero-tenant": in multi-tenant management clusters each tenant has its own namespaced
Manager, fetched tenant-aware by the manager controller's existingGetManager. The installation and apiserver controllers are cluster-scoped and have no tenant namespace in scope, andrbacManagementis a zero-tenant-only feature — so the new helper deliberately does a cluster-scoped Get ({Name: "tigera-secure"}, no namespace). Its name is a guardrail: callers needing a tenant-scoped Manager must keep usingGetManager; non-manager controllers reading zero-tenant-only flags useGetZeroTenantManagerOrNil. On a multi-tenant cluster the cluster-scoped Manager typically doesn't exist, so the helper returnsniland the feature stays off — the intended fail-safe behavior.Why the new watches: both controllers also add a watch on the
ManagerCR. Without it, togglingrbacManagement.enabledwould not re-trigger an installation/apiserver reconcile, so therbacsynccontroller and the network-admin rule wouldn't appear or disappear until some unrelated event happened to re-run those controllers.Components affected: apiserver (
tigera-network-admin), manager (calico-managerrole + network policy), kube-controllers (rbacsync), Manager CRD, installation & apiserver controllers.Release Note
For PR author
make gen-filesmake gen-versionsFor PR reviewers
A note for code reviewers - all pull requests must have the following:
kind/bugif this is a bugfix.kind/enhancementif this is a a new feature.enterpriseif this PR applies to Calico Enterprise only.