test: (scriptless) Enable scriptless phase 3 in AB e2es#8453
Conversation
There was a problem hiding this comment.
Pull request overview
Enables “scriptless phase 3” coverage in the AgentBaker e2e suite by adding a new scriptless_anc subtest path that provisions nodes using AKSNodeConfig/aks-node-controller, plus wiring many existing scenarios to provide an AKSNodeConfigMutator.
Changes:
- Added a new
scriptless_ancsubtest variant and runtime flag (EnableScriptlessANC) to drive scriptless phase-3 execution. - Refactored/expanded the e2e “aks-node-controller hack” customData generation to optionally include AKSNodeConfig and/or an nbc-cmd script.
- Updated many scenarios to set equivalent
AKSNodeConfigMutatorfields alongside existing NBC mutators.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| e2e/vmss.go | Refactors customData hack generation and wires scriptless ANC + NBC cmd hack paths into VMSS creation. |
| e2e/types.go | Adds EnableScriptlessANC and adjusts kubelet-config-file detection logic for scriptless ANC scenarios. |
| e2e/test_helpers.go | Adds scriptless_anc subtest generation and new gating helper. |
| e2e/scenario_test.go | Adds AKSNodeConfigMutator coverage across many existing scenarios. |
…e/AgentBaker into lily/scriptless/phase-3-e2e
Co-authored-by: lilypan26 <106703606+lilypan26@users.noreply.github.com>
…e/AgentBaker into lily/scriptless/phase-3-e2e
|
AgentBaker Linux PR gate — E2E mass-failure (change-caused, HIGH confidence)
Two failure shapes, same root cause:
Likely root cause (
Build-vs-test: test-code-caused (e2e converter + scenario tagging) with a tightly-coupled product-code enabler ( Confidence: HIGH — three independent indicators converge: PR's own validator firing loudly on ACL/default, the Strongest alternative (less likely): Side-channel (not the cause, FYI): Recommended next action (owner: @lilypan26):
Posted by Clawpilot AgentBaker gate detective. |
| go 1.25.10 | ||
|
|
||
| require ( | ||
| github.com/Azure/agentbaker/aks-node-controller v0.0.0-20241215075802-f13a779d5362 |
There was a problem hiding this comment.
oh! this version wont exist when we import this to RP isnt it? because we import v20260527.0 types
|
AgentBaker Linux PR gate — Mass "node not ready" / 600s timeout across 209 scenarios (likely PR-related — scriptless phase 3 enablement)
Dominant signature (essentially every failing scenario): VMs reach running state and SSH bastion is reachable, but the test never sees the node register in the AKS API server within 600s, and the test framework's kube-client itself is hitting client-side rate-limit timeouts. Cross-scenario scope: Ubuntu 22.04, Ubuntu 24.04, AzureLinuxV3, ACL, ARM64 — all distros, scriptless and non-scriptless variants alike. NO Cluster fingerprint: all failing scenarios route through managed cluster Build-vs-test: test/infrastructure (no per-VM CSE/VHD failure observed; nodes boot but never join API). Recommended next action / owner: PR author + NodeSIG-dev — please (1) verify the Side note: this run shows no Posted by Clawpilot AgentBaker gate detective. |
|
AgentBaker Linux PR gate — Same mass "node not ready" / kubenet-v5 cluster signature as prior run (still PR-related, not fwupd)
Identical signature to the prior commented run:
Status: No change in failure shape between the prior run and this one — the scriptless phase 3 enablement + new Three-level analysis: unchanged from prior comment on build 167244552. PR-related (scriptless phase 3 + kubenet-v5 cluster), not fwupd, not infra-flake. Confidence: HIGH that this is specific to the scriptless phase 3 enablement + kubenet-v5 pool combo; the same exact 209-failure pattern reproducing across two builds on this PR with no commits to main between them strongly confirms determinism rather than transient throttling. Recommended next action / owner: PR author + NodeSIG-dev — please follow up on prior actions: (1) verify Posted by Clawpilot AgentBaker gate detective. |
What this PR does / why we need it:
EnableScriptlessANCCustomDataPhase3to provide both ANC and NBC cse cmd to AKS node controllerWhich issue(s) this PR fixes:
Fixes #