Skip to content

feat: aks-node-controller self-update mechanism and version command support#8257

Open
Devinwong wants to merge 16 commits intomainfrom
devinwon/anc_hotfix_changes
Open

feat: aks-node-controller self-update mechanism and version command support#8257
Devinwong wants to merge 16 commits intomainfrom
devinwon/anc_hotfix_changes

Conversation

@Devinwong
Copy link
Copy Markdown
Collaborator

What this PR does / why we need it:
Feat: aks-node-controller self-update mechanism and version command support
This is needed for hot fixing aks-node-controller

Which issue(s) this PR fixes:

Fixes #

Copilot AI review requested due to automatic review settings April 9, 2026 17:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a hotfix-oriented self-update mechanism to aks-node-controller (ANC) plus a version/--version interface, and wires VHD build-time version stamping into the ANC binaries.

Changes:

  • Stamp ANC binaries with a build-time version via -ldflags (from IMAGE_VERSION) and expose it via version / --version.
  • Add ANC self-update logic that reads a hotfix target version and installs a package via the host package manager before provisioning commands run.
  • Add unit tests for the new helpers/retry logic and version exit-code behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packer.mk Adds ldflags-based version stamping for ANC builds when IMAGE_VERSION is set.
aks-node-controller/version.go Introduces main.Version build-time variable (default dev).
aks-node-controller/selfupdate.go Implements hotfix version detection, package-manager install, retries, and exec re-launch.
aks-node-controller/selfupdate_test.go Adds tests for hotfix-version parsing, pkg mgr detection (host-dependent), and retry logic.
aks-node-controller/app.go Adds version/--version command handling and calls selfUpdate before command execution.
aks-node-controller/app_test.go Adds tests asserting version/--version return success exit codes.

The VHD build pipeline does not pass IMAGE_VERSION to the packer build
step - it is only computed later in generate-vhd-publishing-info.sh.
Use date-based fallback (YYYYMMDD.0.0) matching the VHD image_version
format, while still honoring IMAGE_VERSION if explicitly set.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 9, 2026 19:01
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

- Make hotfixVersionPath and aptSourcesDir configurable via App fields
  for deterministic unit testing (no dependency on /etc filesystem)
- Make reExec failure best-effort: log warning and continue with
  VHD-baked binary instead of aborting provisioning
- Add mariner (Azure Linux v2) support in detectPackageManager
- Prefer tdnf over dnf for Azure Linux family (tdnf is the native tool)
- Rename installWithDnf to installWithRpm to handle both dnf and tdnf
- Add TestSelfUpdate_VersionMatch using configurable path (no longer
  limited to indirect testing via readHotfixVersion)
- Add TestSelfUpdate_UnreadableFile to verify graceful degradation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Copilot AI review requested due to automatic review settings April 9, 2026 20:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


// Overwrite the VHD-baked binary so the hotfix persists across service restarts and reboots.
if err := replaceBinary(pkgBinaryPath, vhdBinaryPath); err != nil {
slog.Warn("failed to replace VHD binary with hotfix, proceeding with current binary",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be slog.Error? As it's an error? I don't know the slog API very well tho.

}

switch pkgMgr {
case "apt-get":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should really be enums rather than strings - they're used in multiple places and have a restricted set of values.

}

// Overwrite the VHD-baked binary so the hotfix persists across service restarts and reboots.
if err := replaceBinary(pkgBinaryPath, vhdBinaryPath); err != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of replacing the file for the binary that is running. Can we put it alongside the existing binary with an exteison like "vhdBinaryName-"?

I know that it's OK on Linux due to the way that linux filesystems work, but it will break windows. So it'd be nice to not have to fix this when we add windows support.

Copy link
Copy Markdown
Contributor

@timmy-wright timmy-wright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Use enums
  • Consider windows - doesn't allow replacing a binary that is running
  • error logging should prob use slog.Error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants