feat: aks-node-controller self-update mechanism and version command support#8257
feat: aks-node-controller self-update mechanism and version command support#8257
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a hotfix-oriented self-update mechanism to aks-node-controller (ANC) plus a version/--version interface, and wires VHD build-time version stamping into the ANC binaries.
Changes:
- Stamp ANC binaries with a build-time version via
-ldflags(fromIMAGE_VERSION) and expose it viaversion/--version. - Add ANC self-update logic that reads a hotfix target version and installs a package via the host package manager before provisioning commands run.
- Add unit tests for the new helpers/retry logic and version exit-code behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
packer.mk |
Adds ldflags-based version stamping for ANC builds when IMAGE_VERSION is set. |
aks-node-controller/version.go |
Introduces main.Version build-time variable (default dev). |
aks-node-controller/selfupdate.go |
Implements hotfix version detection, package-manager install, retries, and exec re-launch. |
aks-node-controller/selfupdate_test.go |
Adds tests for hotfix-version parsing, pkg mgr detection (host-dependent), and retry logic. |
aks-node-controller/app.go |
Adds version/--version command handling and calls selfUpdate before command execution. |
aks-node-controller/app_test.go |
Adds tests asserting version/--version return success exit codes. |
The VHD build pipeline does not pass IMAGE_VERSION to the packer build step - it is only computed later in generate-vhd-publishing-info.sh. Use date-based fallback (YYYYMMDD.0.0) matching the VHD image_version format, while still honoring IMAGE_VERSION if explicitly set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ify expected behavior
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Make hotfixVersionPath and aptSourcesDir configurable via App fields for deterministic unit testing (no dependency on /etc filesystem) - Make reExec failure best-effort: log warning and continue with VHD-baked binary instead of aborting provisioning - Add mariner (Azure Linux v2) support in detectPackageManager - Prefer tdnf over dnf for Azure Linux family (tdnf is the native tool) - Rename installWithDnf to installWithRpm to handle both dnf and tdnf - Add TestSelfUpdate_VersionMatch using configurable path (no longer limited to indirect testing via readHotfixVersion) - Add TestSelfUpdate_UnreadableFile to verify graceful degradation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
|
||
| // Overwrite the VHD-baked binary so the hotfix persists across service restarts and reboots. | ||
| if err := replaceBinary(pkgBinaryPath, vhdBinaryPath); err != nil { | ||
| slog.Warn("failed to replace VHD binary with hotfix, proceeding with current binary", |
There was a problem hiding this comment.
Shouldn't this be slog.Error? As it's an error? I don't know the slog API very well tho.
| } | ||
|
|
||
| switch pkgMgr { | ||
| case "apt-get": |
There was a problem hiding this comment.
These should really be enums rather than strings - they're used in multiple places and have a restricted set of values.
| } | ||
|
|
||
| // Overwrite the VHD-baked binary so the hotfix persists across service restarts and reboots. | ||
| if err := replaceBinary(pkgBinaryPath, vhdBinaryPath); err != nil { |
There was a problem hiding this comment.
I'm not a huge fan of replacing the file for the binary that is running. Can we put it alongside the existing binary with an exteison like "vhdBinaryName-"?
I know that it's OK on Linux due to the way that linux filesystems work, but it will break windows. So it'd be nice to not have to fix this when we add windows support.
timmy-wright
left a comment
There was a problem hiding this comment.
- Use enums
- Consider windows - doesn't allow replacing a binary that is running
- error logging should prob use slog.Error
What this PR does / why we need it:
Feat: aks-node-controller self-update mechanism and version command support
This is needed for hot fixing aks-node-controller
Which issue(s) this PR fixes:
Fixes #