feat: implement config backup controller#321
Conversation
0a41b4c to
75b5112
Compare
felix-kaestner
left a comment
There was a problem hiding this comment.
@elinalin did you even test your changes at all? If so, how? When I run the operator with these changes I get a clear
ConfigBackup.networking.metal.ironcore.dev", "error": "no matches for kind \"ConfigBackup\" in version \"networking.metal.ironcore.dev/v1alpha1\""
Meaning that the operator won't even start or doesn't know this api type. Again, because the scaffolding from kubebuilder is missing.
I did test the controller/provider logic, including real NX-OS Local backup behavior(the lab container) , but you are right that I missed the fresh deployment/install path. I checked the code after your comment:
So this is not missing Go-side kubebuilder scaffolding. The real issue is that the ConfigBackup CRD is not being delivered/installed in the deployment path you used, so the API server does not recognize the kind and the operator cannot start cleanly against that cluster. I’ll fix the install artifacts and validate again from a fresh deployment. |
da85da4 to
914cfb0
Compare
|
|
||
| // +kubebuilder:rbac:groups=networking.metal.ironcore.dev,resources=configbackups,verbs=get;list;watch;create;update;patch;delete | ||
| // +kubebuilder:rbac:groups=networking.metal.ironcore.dev,resources=configbackups/status,verbs=get;update;patch | ||
| // +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch |
There was a problem hiding this comment.
| // +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch | |
| // +kubebuilder:rbac:groups=events.k8s.io,resources=events,verbs=create;patch |
There was a problem hiding this comment.
This was not change. Please don't mark as resolved, when not addressed.
| TotalBytes *int64 | ||
| UsedBytes *int64 | ||
| FreeBytes *int64 |
There was a problem hiding this comment.
| TotalBytes *int64 | |
| UsedBytes *int64 | |
| FreeBytes *int64 | |
| TotalBytes int64 | |
| UsedBytes int64 | |
| FreeBytes int64 |
don't see a reason why these should be pointers.
There was a problem hiding this comment.
Keep these as pointers because nil represents “unknown / not reported by the provider”, which is distinct from a real 0 value. The controller currently relies on that distinction when computing free percent and evaluating storage thresholds.
aaf8c11 to
d58c19e
Compare
441571d to
09faff4
Compare
Signed-off-by: Minmin Lin <minmin.lin@sap.com>
09faff4 to
415baf6
Compare
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
| InterLeakPKey: InterLeakPKey{ | ||
| Asn: "none", | ||
| Inst: "none", | ||
| Asn: interLeakProtocolDefault, |
There was a problem hiding this comment.
Why was this changed? Seems unrelated to the PR.
| "github.com/ironcore-dev/network-operator/internal/apistatus" | ||
| ) | ||
|
|
||
| const managementInterfaceName = "mgmt0" |
There was a problem hiding this comment.
What is the reason for this change?
| Proto RtLeakProto `json:"proto"` | ||
| } | ||
|
|
||
| const interLeakProtocolDefault = "none" |
There was a problem hiding this comment.
Again, unrelated change to the concept of this PR.
|
|
||
| prepare-static-check: FORCE install-goimports install-golangci-lint install-shellcheck install-typos install-go-licence-detector install-addlicense | ||
|
|
||
| CONTROLLER_GEN_VERSION ?= v0.21.0 |
There was a problem hiding this comment.
Please revert. The Makefile is currently generated by go-makefile-maker, so any manual changes not done through the Makefile.maker.yaml configuration file are overritten on the next invocation.
| "regexp" | ||
| ) | ||
|
|
||
| const jsonNullLiteral = "null" |
There was a problem hiding this comment.
Again, unrelated changes to the PR.
Please don't bloat your PR with such unrelated changes.
If you want to do this. Please open a separate PR, but that shouldn't go into a "BackupConfig Controller" PR.
| type ConfigBackupRequest struct { | ||
| ConfigBackup *v1alpha1.ConfigBackup | ||
| ProviderConfig *ProviderConfig | ||
| } | ||
|
|
||
| type DeleteConfigBackupsRequest struct { | ||
| ConfigBackup *v1alpha1.ConfigBackup | ||
| ProviderConfig *ProviderConfig | ||
| } |
There was a problem hiding this comment.
DeleteConfigBackupsRequest is identical to ConfigBackupRequest. Therefore, we can just use ConfigBackupRequest for the parameter of the DeleteConfigBackups func. No need to duplicate the same struct.
| if obj == nil { | ||
| return "" | ||
| } | ||
| return fmt.Sprintf("configbackup-%s-%s-", obj.Namespace, obj.Name) |
There was a problem hiding this comment.
Please just make this a method on the *v1alpha1.ConfigBackup type. Perhaps something like
func (c *v1alpha1.ConfigBackup) Filename() string {
return fmt.Sprintf("configbackup-%s-%s-", obj.Namespace, obj.Name)
}(note: such method should have a nil check).
|
|
||
| // +kubebuilder:rbac:groups=networking.metal.ironcore.dev,resources=configbackups,verbs=get;list;watch;create;update;patch;delete | ||
| // +kubebuilder:rbac:groups=networking.metal.ironcore.dev,resources=configbackups/status,verbs=get;update;patch | ||
| // +kubebuilder:rbac:groups=core,resources=events,verbs=create;patch |
There was a problem hiding this comment.
This was not change. Please don't mark as resolved, when not addressed.
| } | ||
|
|
||
| orig := obj.DeepCopy() | ||
| if conditions.InitializeConditions(obj, v1alpha1.ReadyCondition, v1alpha1.ConfiguredCondition, v1alpha1.OperationalCondition) { |
There was a problem hiding this comment.
Again, an Operational condition doesn't make much sense for this resource type, as there is no Operational State that is queried from the Device. So please just use a single Ready condition for tracking all Success/Error cases, similar to how it's done e.g. for the AccessControlList resource type.
| func configBackupRequestFor(obj *v1alpha1.ConfigBackup) *provider.ConfigBackupRequest { | ||
| return &provider.ConfigBackupRequest{ConfigBackup: obj} | ||
| } |
There was a problem hiding this comment.
Please avoid adding such funcs. They add very little value as a separate function and just make it harder to read the code. Just do &provider.ConfigBackupRequest{ConfigBackup: obj} inline. This is a oneliner.
| return &provider.ConfigBackupRequest{ConfigBackup: obj} | ||
| } | ||
|
|
||
| const maxInt32 = int(^uint32(0) >> 1) |
There was a problem hiding this comment.
There is just math.MaxInt32 in the standard library. No need to define this ourselves. Also please think about if it is realistic to have so many backups and if it is worth adding this complexity or if we should rather assume to never have more than 2147483647 backups (which I think is realistic).
| TotalBytes: cloneInt64Ptr(inventory.TotalBytes), | ||
| UsedBytes: cloneInt64Ptr(inventory.UsedBytes), | ||
| FreeBytes: cloneInt64Ptr(inventory.FreeBytes), |
There was a problem hiding this comment.
What is the reason to clone a *in32 here? I think we can just assign it as is (no need to cloning).
| conn *grpc.ClientConn | ||
| client gnmiext.Client | ||
| nxapi *nxapi.Client | ||
| baseConn *deviceutil.Connection |
There was a problem hiding this comment.
Why do we have this baseConn field? It serves no purpose.
| type ConfigBackupInventory struct { | ||
| Backups []ConfigBackupFile | ||
| TotalBytes *int64 | ||
| UsedBytes *int64 |
There was a problem hiding this comment.
Can't UsedBytes simply be derived from the Backups []ConfigBackupFile? Something like:
func (c *ConfigBackupInventory) UsedBytes() int64 {
sum := int64(0)
for _, file := range c.Backups {
sum += c.SizeBytes
}
return sum
}|
|
||
| func prometheusLabels(backupType v1alpha1.ConfigBackupType, result string) []string { | ||
| return []string{result, string(backupType)} | ||
| } |
There was a problem hiding this comment.
Just inline this. Having such oneliner helper functions just necessarily increases complexity for a reader to follow the code.
Summary
This PR introduces the
ConfigBackupcontroller for on-device configuration backups.The controller supports:
type: Localfor timestamped backups written to device-local storagetype: Startupfor persisting the running configuration asstartup-configThe initial provider implementation targets Cisco NX-OS.
What Is Included
ConfigBackupCRD with spec and statusConfigBackupcontroller reconcile flowValidation Performed
I validated this change at multiple levels.
1. Controller and provider tests
I ran the controller and provider-side tests to validate:
2. Live device validation for local backup behavior
On the lab device, I verified the local backup path directly:
This validated the device-side behavior for
type: Local.3. Full local E2E validation for startup backup
I also validated the full local end-to-end flow for
type: Startup:DeviceConfigBackupConfigConfigBackupDevicereconciled toRunning/Reachable=TrueConfigBackupreconciled toReady=Truestatus.lastBackup.locationwas set tostartup-configBackupCompletedevent was recordedThis addresses the previously missing local operator E2E validation for the startup backup path.
Notes
type: Startuprequires NX-API because gNMI cannot executecopy running-config startup-config.ConfigBackupConfigthat can override the NX-API address forConfigBackup.Example E2E Assets
examples/configbackup-startup-local-e2e/manifest.yamlexamples/configbackup-startup-local-e2e/README.md