feat(anc): support a base-to-version hotfix map in downloadHotfix#8694
feat(anc): support a base-to-version hotfix map in downloadHotfix#8694Devinwong wants to merge 5 commits into
Conversation
…tfix M1 2.1a) Adds a base (YYYYMM.DD) -> hotfix version (YYYYMM.DD.PATCH) map to the ANC hotfix config so a single config can pin hotfixes for multiple VHD bases at once, with default-deny for unlisted bases. The legacy single 'version' field is still honored when the map is empty for full backward compatibility. - hotfixConfig gains Hotfixes map; resolveVersion() applies map-first then legacy fallback; hotfixBaseFromVersion() splits on '.' to preserve the leading-zero day so map keys match exactly. - readHotfixConfig() added; readHotfixVersion() retained via it. - downloadHotfix() resolves via the map and still gates through the unchanged shouldUpgradeToHotfix() patch-only-strictly-higher semantics. - Unparseable current version with a map present fails open (no hotfix). Part of the Provisioning-Hotfix / live-patching-controller ConfigMap design (M1). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR extends aks-node-controller hotfix resolution to support a base→hotfix mapping, allowing one config to target multiple ANC “base” versions (YYYYMM.DD) while keeping backward compatibility with the legacy single version field.
Changes:
- Add
hotfixConfig.Hotfixesmap (base -> hotfix version) with default-deny behavior for unlisted bases. - Implement base extraction (
hotfixBaseFromVersion) and config-aware resolution (hotfixConfig.resolveVersion) with legacy fallback. - Add unit tests covering config parsing, base extraction, resolution behavior, and download behavior with the map.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| aks-node-controller/hotfix.go | Adds map-based hotfix resolution and config parsing helpers; wires resolution into downloadHotfix(). |
| aks-node-controller/hotfix_test.go | Adds tests for config parsing, base extraction, resolution precedence, and map-driven download behavior. |
…o-ops Adds TestDownloadHotfix_MapMisconfiguredValueBaseSkips: when a hotfixes map entry's value base (YYYYMM.DD) does not match its key, resolveVersion selects it by key but shouldUpgradeToHotfix rejects it because the bases differ, so no wrong-base binary is installed. Locks in the default-safe behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses PR review: - downloadHotfix now logs and skips (returns nil) when the hotfix config is unreadable or invalid JSON, instead of returning an error. This honors the fail-open guarantee so a malformed config can never block provisioning. - hotfixBaseFromVersion now rejects a present-but-empty patch segment (e.g. '202604.01.') so an obviously malformed current version never selects a map entry, matching the documented YYYYMM.DD.PATCH contract. - Tests: replace TestDownloadHotfix_UnreadableFile with fail-open assertions, add TestDownloadHotfix_InvalidJSONFailsOpen, and cover the empty-patch case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses PR review on the two fail-open tests so they prove the skip is specifically caused by the unreadable/invalid config, not an incidental version parse skip: - Set a parseable, hotfix-eligible Version (202604.01.0) and configure aptSourcesDir so a readable/valid config would proceed to install and flip installCalled. The only reason install does not fire is the config-read failure (fail-open). - Make the unreadable case robust cross-platform: if chmod 0000 is ineffective, replace the path with a directory so the read genuinely fails. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The legacy readHotfixVersion function had no production callers after downloadHotfix switched to readHotfixConfig + resolveVersion. Remove it and fold its forward-compat coverage into TestReadHotfixConfig. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| hotfixPath := a.hotfixVersionPath | ||
| if hotfixPath == "" { | ||
| hotfixPath = defaultHotfixVersionPath | ||
| } |
🕵️ AgentBaker Linux gate — automated RCA for build 167737875Signature: Level 1 — Surface115 scenarios across 75+ distinct top-level tests fail at All on shared cluster Level 2 — Corroboration (≥2 independent evidence sources)
Cluster create + bastion + ARM/VMSS provisioning are all healthy. Failure is strictly at the post-VMSS-create kubelet → apiserver node-registration step. Level 3 — Root cause + strongest alternativeRoot cause (accepted): Shared cluster Strongest alternative considered: PR #8694 ANC hotfix M1 2.1a (
Action
🤖 Posted by |
Provisioning-Hotfix groundwork - Milestone 1 (POC draft)
Background
A "hotfix" is a small change to the ANC binary or a few
/opt/azure/containers/*.shscripts, shipped as one PMC package and applied at node provisioning time on top of VHD-baked content. VMSS model customdata can go stale on scale-out, autoscaler, or reimage, so new VMs may boot with buggy code. The broader design adds a second pointer channel the node reads at boot; this PR lays the AgentBaker (Node SIG) groundwork.What 2.1a does
Extends
aks-node-controller/hotfix.go:hotfixConfiggainsHotfixes map[string]stringmapping an ANC version base (YYYYMM.DD) to a hotfix version (YYYYMM.DD.PATCH), so one config can pin hotfixes for multiple VHD bases at once, with default-deny for any base not listed.resolveVersion()applies the map first, then falls back to the legacy singleversionfield when the map is empty (full backward compatibility).hotfixBaseFromVersion()splits on.(not semver) to preserve the leading-zero day so map keys match exactly (e.g.202604.01).readHotfixConfig()parses the shared config shape.downloadHotfix()resolves via the map and still gates through the unchangedshouldUpgradeToHotfix()"same base, patch strictly higher" semantics.Net effect (examples)
The node's baked-in ANC reports its own version (e.g.
202604.01.0).downloadHotfixresolves the config against that version and only upgrades when there is a higher patch for the same base.Example 1 - map targets this node's base, higher patch -> upgrade
202604.01.0202604.01->202604.01.3202604.01.3Example 2 - base not listed -> no-op (default-deny)
{ "hotfixes": { "202605.30": "202605.30.1" } }202604.01.0Example 3 - patch not higher -> no-op
{ "hotfixes": { "202604.01": "202604.01.0" } }202604.01.0202604.01->202604.01.0Example 4 - legacy single-version config (unchanged behavior)
{ "version": "202604.01.3" }Resolves exactly as before: upgrade only if
202604.01.3is a higher patch of the same base as the baked ANC.Files changed
aks-node-controller/hotfix.goaks-node-controller/hotfix_test.goTest results
go build ./...passes; all pure-logic unit tests pass. Two tests (TestDownloadHotfix_MatchingBaseUpgrades,TestDownloadHotfix_MapMatchingBaseUpgrades) fail on Windows only because they exercise package-manager detection that needs/etc/os-release,bash, and unix file permissions. They pass in Linux CI and are unrelated to this change.