Skip to content

chore(deps): update kubelet-kubectl (patch)#8600

Open
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/patch-kubelet-kubectl
Open

chore(deps): update kubelet-kubectl (patch)#8600
renovate[bot] wants to merge 1 commit into
mainfrom
renovate/patch-kubelet-kubectl

Conversation

@renovate

@renovate renovate Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

This PR contains the following updates:

Package Type Update Change
kubectl patch 1.35.4-3.azl31.35.5-2.azl3
kubectl 1.34.6-3.azl3 patch 1.34.7-3.azl31.34.8-2.azl3
kubectl patch 1.35.4-ubuntu24.04u31.35.5-ubuntu24.04u2
kubectl 1.34.6-ubuntu24.04u3 patch 1.34.7-ubuntu24.04u31.34.8-ubuntu24.04u2
kubectl patch 1.35.4-ubuntu22.04u31.35.5-ubuntu22.04u2
kubectl 1.34.6-ubuntu22.04u3 patch 1.34.7-ubuntu22.04u31.34.8-ubuntu22.04u2
kubectl 1.34.6-ubuntu20.04u3 patch 1.34.7-ubuntu20.04u31.34.8-ubuntu20.04u2
kubelet patch 1.35.4-1.azl31.35.5-1.azl3
kubelet 1.34.6-1.azl3 patch 1.34.7-1.azl31.34.8-1.azl3
kubelet patch 1.35.4-ubuntu24.04u11.35.5-ubuntu24.04u1
kubelet 1.34.6-ubuntu24.04u1 patch 1.34.7-ubuntu24.04u11.34.8-ubuntu24.04u1
kubelet patch 1.35.4-ubuntu22.04u11.35.5-ubuntu22.04u1
kubelet 1.34.6-ubuntu22.04u1 patch 1.34.7-ubuntu22.04u11.34.8-ubuntu22.04u1
kubelet 1.34.6-ubuntu20.04u1 patch 1.34.7-ubuntu20.04u11.34.8-ubuntu20.04u1

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Configuration

📅 Schedule: (UTC)

  • Branch creation
    • At any time (no schedule defined)
  • Automerge
    • At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about these updates again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Copilot AI review requested due to automatic review settings May 28, 2026 08:25
@renovate renovate Bot added the renovate This pull request was created by renovate label May 28, 2026
@renovate renovate Bot requested a review from a team May 28, 2026 08:25

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions github-actions Bot added the components This pull request updates cached components on Linux or Windows VHDs label May 28, 2026
@renovate renovate Bot changed the title chore(deps): update kubelet-kubectl (patch) chore(deps): update kubelet-kubectl (patch) - autoclosed May 28, 2026
@renovate renovate Bot closed this May 28, 2026
@renovate renovate Bot deleted the renovate/patch-kubelet-kubectl branch May 28, 2026 11:48
@renovate renovate Bot changed the title chore(deps): update kubelet-kubectl (patch) - autoclosed chore(deps): update kubelet-kubectl (patch) May 28, 2026
@renovate renovate Bot reopened this May 28, 2026
@renovate renovate Bot force-pushed the renovate/patch-kubelet-kubectl branch from 5284d06 to a8cc93f Compare May 28, 2026 19:04
Copilot AI review requested due to automatic review settings May 28, 2026 20:38
@renovate renovate Bot force-pushed the renovate/patch-kubelet-kubectl branch from a8cc93f to 3a9838a Compare May 28, 2026 20:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@renovate renovate Bot force-pushed the renovate/patch-kubelet-kubectl branch from 3a9838a to 67edbb9 Compare May 29, 2026 01:32
Copilot AI review requested due to automatic review settings May 29, 2026 15:47
@renovate renovate Bot force-pushed the renovate/patch-kubelet-kubectl branch from 67edbb9 to d00cbfd Compare May 29, 2026 15:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@renovate renovate Bot force-pushed the renovate/patch-kubelet-kubectl branch from d00cbfd to c1c68b8 Compare May 29, 2026 16:34
Copilot AI review requested due to automatic review settings May 29, 2026 20:31
@renovate renovate Bot force-pushed the renovate/patch-kubelet-kubectl branch from c1c68b8 to 9ff88a5 Compare May 29, 2026 20:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@renovate renovate Bot force-pushed the renovate/patch-kubelet-kubectl branch 2 times, most recently from cc5a960 to 0348a15 Compare May 29, 2026 22:24
Copilot AI review requested due to automatic review settings May 29, 2026 22:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@renovate renovate Bot force-pushed the renovate/patch-kubelet-kubectl branch from 0348a15 to 35338a6 Compare May 30, 2026 01:40
Copilot AI review requested due to automatic review settings May 30, 2026 04:09
@renovate renovate Bot force-pushed the renovate/patch-kubelet-kubectl branch from 35338a6 to 9532b7d Compare May 30, 2026 04:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@aks-node-assistant

Copy link
Copy Markdown
Contributor

AgentBaker Linux PR gate — Ubuntu 24.04 fwupd.service mass E2E failure (RECURRING main regression, NOT this PR)

  • Run: 167241354 (failed)
  • Failed task: Run AgentBaker E2E (Stage e2e → Job/Phase Run AgentBaker E2E)
  • Test summary: DONE 438 tests, 95 skipped, 15 failures in 1573.003s
  • Primary signature: validators.go:995: 🔴 FAIL: the following systemd units have unexpectedly entered a failed state: [fwupd.service] (8 hits)

Failing scenarios — all Ubuntu 24.04:
Test_Ubuntu2404_CSE_CachedPerformance, Test_Ubuntu2404_CSE_FullInstallPerformance, Test_Ubuntu2404_NPD_Basic, Test_Ubuntu2404_Scriptless, Test_Ubuntu2404_SecureTLSBootstrapping_BootstrapToken_Fallback, Test_Ubuntu2404Gen2, Test_Ubuntu2404Gen2_McrChinaCloud.

Same fwupd.service 24.04 main regression previously flagged on builds 167206065, 167219726, 167221197, 167238023. This PR (kubelet/kubectl patch bump) does not touch fwupd or VHD systemd unit config.

Build-vs-test: product/VHD regression caught by E2E (NOT a flake, NOT test-code).
Confidence: HIGH that PR #8600 is not the cause; HIGH that this is a 24.04 VHD main regression.
Strongest alternative (less likely): kubelet/kubectl patch bump indirectly tripping fwupd.service startup — refuted: same signature reproduces on unrelated PRs (renovate node-exporter #8294, STLS refactor #8618, secondary-nics #8642) on the same main HEAD; scope strictly 24.04; no kubelet linkage to fwupd.

Recommended next action / owner: NodeSIG-dev — main-branch fix still pending. Likely mitigation: mask fwupd.service in 24.04 VHD or fix the first-start dependency in vhdbuilder/packer/install-dependencies.sh / tool_installs_distro.sh. PR author: do NOT block merge on this; rebase + rerun once the main fix lands.

Posted by Clawpilot AgentBaker gate detective.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@aks-node-assistant

Copy link
Copy Markdown
Contributor

AgentBaker Linux PR gate — Ubuntu 24.04 fwupd.service mass E2E failure (RECURRING main regression, NOT this PR)

  • Run: 167255168 (failed)
  • Failed task: Run AgentBaker E2E
  • Test summary: DONE 438 tests, 95 skipped, 17 failures in 1761.428s
  • Primary signature: validators.go:995: 🔴 FAIL: the following systemd units have unexpectedly entered a failed state: [fwupd.service] (9 hits)

Failing scenarios — all Ubuntu 24.04: Test_LocalDNSHostsPlugin (24.04 legs), Test_Ubuntu2404_CSE_CachedPerformance, Test_Ubuntu2404_CSE_FullInstallPerformance, Test_Ubuntu2404_NPD_Basic, Test_Ubuntu2404Gen2, Test_Ubuntu2404Gen2_McrChinaCloud, Test_Ubuntu2404Gen2_McrChinaCloud_Scriptless.

Same fwupd.service 24.04 main regression previously flagged on builds 167206065, 167219726, 167221197, 167238023, 167241354. This PR (kubelet/kubectl patch bump) does not touch fwupd or VHD systemd unit config.

Build-vs-test: product/VHD regression caught by E2E (NOT a flake, NOT test-code).
Confidence: HIGH that PR #8600 is not the cause; HIGH that this is a 24.04 VHD main regression.
Strongest alternative (less likely): kubelet/kubectl patch indirectly tripping fwupd.service — refuted: same signature reproduces on unrelated PRs (#8294, #8618, #8642, #8652) on the same main HEAD; scope strictly 24.04.

Recommended next action / owner: NodeSIG-dev — main fix still pending. Likely mitigation: mask fwupd.service in 24.04 VHD or fix the first-start dependency in vhdbuilder/packer/install-dependencies.sh / tool_installs_distro.sh. PR author: do NOT block merge on this; rebase + rerun once the main fix lands.

Posted by Clawpilot AgentBaker gate detective.

@aks-node-assistant

Copy link
Copy Markdown
Contributor

Fwupd 24.04 gate regression — fix incoming, no action needed on this PR

The Ubuntu 24.04 [fwupd.service] mass-failure flagged on your prior gate run is now tracked and being fixed:

Once #8662 merges, rerun the gate on this PR. No code change required on your side.

Posted by Clawpilot AgentBaker gate detective.

@aks-node-assistant

Copy link
Copy Markdown
Contributor

AgentBaker Linux PR gate — debugnonhost-mariner-tolerated daemonset write conflict on shared cluster (test-infra race, NOT this PR; fwupd 24.04 NOT present in this run)

  • Run: 167370141 (failed)
  • Failed task: Run AgentBaker E2E
  • Test summary: DONE 457 tests, 95 skipped, 16 failures in 1610.656s (0 fwupd hits — so this is NOT the 24.04 main regression)

Failing scenarios — all on the shared abe2e-azure-overlay-v4 cluster: Test_Random_VHD_With_Latest_Kubernetes_Version and the Test_Ubuntu2204Gen2_ImagePullIdentityBinding_* family (Disabled, Disabled_Scriptless, Enabled, Enabled_Scriptless, EnabledWithoutDefaultIDs).

Exact failure signature (identical across all failing scenarios):

prepare cluster tasks: dag execution failed:
Operation cannot be fulfilled on daemonsets.apps "debugnonhost-mariner-tolerated":
the object has been modified; please apply your changes to the latest version and try again

All scenarios fail at the cluster-prep stage in <10s with a kube-apiserver 409-style optimistic-concurrency conflict on the shared debugnonhost-mariner-tolerated DaemonSet. Multiple parallel scenarios in the same gate run are racing to update the same DaemonSet (probably to apply tolerations/imagePull config) on the shared abe2e-azure-overlay-v4 cluster. Whoever loses the resourceVersion race fails the scenario.

Three-level analysis:

  1. L1: kube apiserver optimistic-concurrency 409 on a shared DaemonSet during cluster-prep DAG.
  2. L2 corroboration: all failures are sub-10s (no VM provisioning attempted); restricted to scenarios sharing the same cluster; failure message is the canonical k8s the object has been modified; please apply your changes to the latest version and try again. No node/VHD/CSE involvement. PR chore(deps): update kubelet-kubectl (patch) #8600 is a kubelet/kubectl patch bump — the conflict happens on the test-harness DaemonSet apply call, not in any node-side code.
  3. L3 challenge: alternatives — (a) PR-caused: kubelet/kubectl version bump triggering apiserver behavior change → refuted, the conflict comes from the e2e harness's own DaemonSet apply, not kubelet/kubectl runtime; (b) shared cluster broken: possible but more narrowly framed as a missing retry-on-conflict in the harness's prepare cluster tasks DAG. Strongest alt: harness retry loop on DaemonSet apply is missing/insufficient — most likely root cause and fix point.

Build-vs-test: test-infra/harness (NOT product, NOT PR-caused).
Confidence: HIGH that PR #8600 is not the cause; HIGH that this is a DaemonSet apply-race in the prepare-cluster DAG.

Recommended next action / owner: E2E harness owner / NodeSIG-dev — wrap the debugnonhost-mariner-tolerated DaemonSet apply in prepare cluster tasks with a get-modify-retry-on-conflict loop (standard k8s retry.RetryOnConflict pattern), or serialize the apply across scenarios in the same gate run. PR author: do NOT block merge on this; this is an E2E fixture race unrelated to the kubelet/kubectl patch.

Posted by Clawpilot AgentBaker gate detective.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@aks-node-assistant

Copy link
Copy Markdown
Contributor

AgentBaker Linux PR gate — 236-failure mass run: shared cluster proxy-pod readiness + ResourceGroupBeingDeleted (test-infra, NOT this PR)

  • Run: 167387387 (failed)
  • Failed task: Run AgentBaker E2E
  • Test summary: DONE 402 tests, 95 skipped, 236 failures in 651.509s (~59% failure rate; 0 fwupd hits)

Same shape as concurrent runs in this window: dominant prepare cluster tasks: dag execution failed: waiting for proxy pod to be ready: ... client rate limiter Wait returned an error: context deadline exceeded (133+ scenarios) plus ~20 RESPONSE 409: ResourceGroupBeingDeleted on abe2e-kubenet-v5-* MC RGs.

Three-level analysis:

  1. L1: proxy DaemonSet readiness + RG teardown collisions on the shared cluster fleet.
  2. L2 corroboration: identical 236-failure signature on PR chore(deps): update runc-containerd-minor to v2.3.1-ubuntu24.04u2 #8652 build 167387444 and PR chore(deps): update node-exporter-kubernetes (patch) #8294 build 167387406 in the same window — three unrelated PRs with the same shape. PR chore(deps): update kubelet-kubectl (patch) #8600 (kubelet/kubectl patch bump) doesn't touch e2e proxy or AKS test-fixture lifecycle.
  3. L3 challenge: "kubelet/kubectl patch bump breaks e2e proxy DaemonSet readiness" — refuted by cross-PR pattern. Strongest alt: shared kubenet-v5/networkisolated-v2 cluster pool stress, same as flagged on builds 167378787, 167387444, 167387406.

Build-vs-test: test-infra, NOT product, NOT PR-caused.
Confidence: HIGH that PR #8600 is not the cause.

Recommended next action / owner: E2E infra / NodeSIG-dev — shared cluster fleet stabilization. PR author: do NOT block merge; rerun once fleet stabilizes.

Posted by Clawpilot AgentBaker gate detective.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@aks-node-assistant

Copy link
Copy Markdown
Contributor

AgentBaker Linux PR gate — 236-failure run: shared cluster fleet outage continues (test-infra, NOT this PR)

  • Run: 167422677 (failed)
  • Failed task: Run AgentBaker E2E (full 60-minute timeout consumed)
  • Test summary: DONE 402 tests, 95 skipped, 236 failures in ~3616s (~59% failure rate; 0 fwupd hits)

Same shared cluster fleet outage affecting every concurrent PR in this window: 123× get or create cluster: failed to wait for cluster abe2e-kubenet-v5-150ee to be ready: context deadline exceeded. Earlier overnight runs hit ~11 min; current runs consume the full 60-min E2E timeout, indicating the fleet is worse, not recovering.

Cross-PR pattern this morning: PR #8652 build 167419663, PR #8679 build 167421198, PR #8294 build 167422687, and concurrent PRs all hit identical 236-fail / cluster-not-ready signature.

Build-vs-test: test-infra (shared cluster fleet outage), NOT product, NOT PR-caused.
This PR's exposure check: kubelet/kubectl renovate patch bump. No path to shared test cluster lifecycle.
Confidence: HIGH that PR #8600 is not the cause.

Recommended next action / owner: ⚠️ E2E infra / NodeSIG-dev — urgent shared cluster fleet restoration required (abe2e-kubenet-v5-*, abe2e-latest-kubernetes-version-v2-*, abe2e-azure-networkisolated-v2-*, abe2e-azure-v4-*, abe2e-azure-bootstrapprofile-cache-v2-*); clear ResourceGroupDeletionBlocked locks. PR gate is effectively offline until restored. PR author: rerun once fleet recovers.

Posted by Clawpilot AgentBaker gate detective.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@aks-node-assistant

Copy link
Copy Markdown
Contributor

AgentBaker Linux PR gate — single E2E failure on shared cluster fixture (NOT this PR)

  • Run: 167476504
  • Failed job/task: Run AgentBaker E2E (only Test_Ubuntu2204_AzureCNI/default failed; all other scenarios and all VHD builds passed)
  • Wiki signature: arm-409-cluster-fixture-contention (wiki)

Detective summary

Shared E2E cluster fixture abe2e-azure-overlay-network-v4-b4fc3 (rg abe2e-westus3) was found in Failed state at scenario start. The test deleted it (~101s) and then tried to recreate, but every retry returned 409 Conflict:

  • Attempt 1: EtagMismatch / FailedPrecondition (rpc error: ... Etag mismatched).
  • Attempts 2–N: OperationNotAllowed — "in progress create managed cluster operation (operation ID: 6250a1b9-ec6d-4e34-b0e8-da6637ee8c53) … started on UTC 2026-06-10T14:26:10Z. Please wait for it to finish".

That operation ID is owned by another concurrent build racing the same shared fixture; this run kept losing the race for the entire retry window.

Likely cause / classification: Test infrastructure contention on a shared cluster fixture, not a PR regression. Same class as the existing wiki row arm-409-cluster-fixture-contention (Azure ARM 409 AnotherOperationInProgress).

Confidence: High.

Strongest alternative theory: Transient ARM/HCP outage — less likely, because the 409 is deterministic on one specific cluster name and explicitly references a concrete in-progress operation ID held by another caller. That is contention, not a provider outage.

Recommended next action / owner: No PR change required. Recommend a retry/rerun of the failed leg only. Test-infra owners (AgentBaker E2E / SIG Node Lifecycle) should look at serializing or randomizing the abe2e-azure-overlay-network-v4 shared fixture name across concurrent PR runs to stop these head-on races; this is the second build in the wiki with the same pattern.

Evidence used: failed task log (1.8k lines, only one === FAIL for Test_Ubuntu2204_AzureCNI/default), all other E2E scenarios passed, all VHD builds passed, PR is a renovate kubelet-kubectl patch (no infra changes).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@aks-node-assistant

Copy link
Copy Markdown
Contributor

AgentBaker Linux PR gate — Test_Ubuntu2204_HTTPSProxy_PrivateDNS proxy fixture unreachable (NOT this PR; tracked by repair item)

  • Run: 167599797
  • Failed job: Run AgentBaker E2E (only HTTPSProxy_PrivateDNS subtests; all VHD builds passed)
  • Wiki signature: httpsproxy-fixture-proxy-unreachable (wiki) — already escalated, tracked by repair item #38383391.

Detective summary

Same pattern as builds 167493131, 167505019, 167534982, 167535509, 167536780, 167567923, 167570253: vmssCSE exits 99 because apt-get update cannot reach the HTTPSProxy_PrivateDNS scenario's HTTP proxy in 10.14.0.0/24. Eighth occurrence of this signature; well past the escalation threshold.

Classification: Test infrastructure / scenario fixture flakiness.

Confidence: High. PR #8600 is a renovate kubelet-kubectl patch — no CSE/proxy/apt change in scope.

Strongest alternative theory: Kubelet/kubectl patch interferes with apt's network path. Less likely — failure is a TCP-level connect refused/no route to host/timed out against the private fixture proxy endpoint, completely independent of kubelet binary version.

Recommended next action / owner: No PR change required. Recommend rerun. Underlying issue is owned by AgentBaker E2E test-infra and tracked by ADO Bug #38383391.

Evidence used: failed task log (3 === FAIL for HTTPSProxy_PrivateDNS, vmssCSE exit 99 against fixture proxy in 10.14.0.0/24), all other E2E and all VHD builds passed.

@aks-node-assistant

Copy link
Copy Markdown
Contributor

🕵️ AgentBaker Linux Gate Detective – Build 167653228 (PR check-in gate, def 119535)

Failed job: Run AgentBaker E2E (stage e2e) – Script exit 1.

Summary: DONE 466 tests, 95 skipped, 15 failures – every single failure is a *NetworkIsolatedCluster* / *NetworkIsolated_Package_Install scenario across AzureLinuxV3, Ubuntu2204, Ubuntu2204_ArtifactStreaming, and ACL, in both default and scriptless_nbc bootstrap modes. All 15 fail at CSE pre-flight with vmssCSE ExitCode 52 after a 300s nslookup -timeout=15 -retry=0 abe2e-azure-networkisolated-v3-<suffix>.hcp.westus3.azmk8s.io against 169.254.10.10 returning NXDOMAIN. No non-NetworkIsolated scenario failed.

3-level RCA

  1. Surface: CSE validator VALIDATION_ERR=52 from the apiserver-FQDN DNS pre-check on every NetworkIsolated VMSS.
  2. Corroboration (≥2 sources): (a) E2E task log 538 shows the same NXDOMAIN payload for the identical apiserver FQDN across 5 distinct distros × 2 bootstrap modes = 15/15 NetworkIsolated tests; (b) build-status “Issues” lists only Run AgentBaker E2E (no VHD-build errors – the SSH warnings are pre-existing/non-blocking); (c) non-NetworkIsolated scenarios in the same run are unaffected → it’s the NetworkIsolated fixture path (private DNS zone / private-link for the hcp FQDN), not the PR.
  3. Root-cause challenge — strongest alternative: The PR is a Renovate kubelet-kubectl version bump. Could it break CSE? Rejected: the failure fires in CSE’s pre-kubelet DNS probe (no kubelet binary involved yet), and it’s uniform on a single private apiserver FQDN across multiple OS images, so a kubelet version change cannot be the cause. Other ruled-out alternatives: shared-cluster-fleet-outage (scoped to kubenet-v5 VMSS-create timeouts, different cluster family); localdns-exporter-* (post-CSE assertions, not CSE-time nslookup); httpsproxy-fixture-proxy-unreachable (different fixture, port 8888 proxy refusal, not DNS NXDOMAIN).

Flaky vs deterministic: Deterministic within this build (15/15 NetworkIsolated scenarios fail identically on the same private FQDN); first observed occurrence across watcher history → tracked as a new signature pending recurrence data.

Build-vs-test class: Test infrastructure (NetworkIsolated shared-cluster fixture / private DNS).

Signature: networkisolated-apiserver-fqdn-nxdomain (new – not present in wiki source-of-truth yet)
Classification: Test infrastructure flakiness / private-DNS reachability
Confidence: High
Strongest alternative theory: Renovate kubelet-kubectl bump regressing CSE (rejected – failure precedes kubelet and is fixture-scoped).

Recommended next action / owner: AgentBaker E2E test-infra – verify the private DNS zone / private-link for *.hcp.westus3.azmk8s.io on the NetworkIsolated shared cluster abe2e-azure-networkisolated-v3-kq4wzvpl and the upstream resolver wired into 169.254.10.10. Safe to rerun the failed E2E job before assuming a code issue; rebase or push only if recurrence persists.

Wiki source-of-truth: AgentBaker Gate PR Pipeline Flakiness – signature networkisolated-apiserver-fqdn-nxdomain will be merged centrally to avoid concurrent edit conflicts.

Posted by Clawpilot AgentBaker Linux Gate Detective Watcher. No raw private logs included.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@aks-node-assistant

Copy link
Copy Markdown
Contributor

AgentBaker Linux PR gate — E2E failure (shared test-fixture issue, NOT this PR)

  • Run: 167741298 (failed, Stage e2e → Run AgentBaker E2E)
  • PR: chore(deps): update kubelet-kubectl (patch) #8600 (renovate/patch-kubelet-kubectl) — Renovate-only kubelet/kubectl bump, no provisioning code touched
  • Leaf failures: 133 across 12 shared cluster families (kubenet-v5, networkisolated-v3, azure-network-v4, overlay-network-v4, bootstrapprofile-cache-v2, latest-kubernetes-version-v2, fw-rt …), uniform across Ubuntu2204/Ubuntu2404/AzureLinuxV3/ACL and both default + scriptless_nbc bootstrap modes

3-level RCA

  1. Surface: Run AgentBaker E2E exits 1; mass leaf failures heterogeneous across distros / scenarios / bootstrap modes / shared clusters.
  2. Mechanism: 115× kube.go:195 … haven't appeared in k8s API server: context deadline exceeded (600–740s) + 71× kube.go:166 client-rate-limiter starvation cascade. VMSS instances never register with the apiserver.
  3. Root cause: Shared-cluster control-plane brownout / fleet outage — apiserver registration timing out across many unrelated underlay families simultaneously. Not PR-correlated.

Classification: Test infrastructure flakiness (build-vs-test class: test-infra). Deterministic-looking within this build, but cross-build pattern is flaky/intermittent (shared-fixture). Confidence: high.

Signature: shared-cluster-fleet-outage — Active, repair item #38373323 already open. This build is the 16th occurrence.

Strongest alternative considered: kubenet-v5-node-not-ready-scriptless (97/133 leaf failures hit abe2e-kubenet-v5-150ee). Disconfirmed because failures are not localized to kubenet-v5 — 36 leaf failures span 11 other shared-cluster families with the same kube.go:195 signature, which matches the "12+ families" criterion of shared-cluster-fleet-outage, not the "localized to kubenet-v5" criterion of the alternative.

Recommended action: Re-run the failed stage when the shared fleet recovers; no PR-side change required. No new repair item — escalation already tracked under #38373323.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

components This pull request updates cached components on Linux or Windows VHDs renovate This pull request was created by renovate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants