Skip to content

Fix reconcile loop caused by non-deterministic env var ordering in di…#4783

Merged
lujunsan merged 2 commits intomainfrom
fix-vmcp-discovered-auth-env-var-ordering
Apr 16, 2026
Merged

Fix reconcile loop caused by non-deterministic env var ordering in di…#4783
lujunsan merged 2 commits intomainfrom
fix-vmcp-discovered-auth-env-var-ordering

Conversation

@lujunsan
Copy link
Copy Markdown
Contributor

Summary

Deployments for VirtualMCPServer with outgoingAuth.source: discovered and 4+
MCPExternalAuthConfigs entered a continuous update loop (~10 updates/sec), causing
constant pod rolling restarts and dropped MCP client sessions.

  • The root cause is the same class of bug fixed in fix(#3448) - vMCP operator deterministically orders servers #3450: non-deterministic iteration
    order (Go map iteration over Spec.OutgoingAuth.Backends; K8s informer cache ordering
    for typedWorkloads) caused discoverExternalAuthConfigSecrets and
    discoverInlineExternalAuthConfigSecrets to return TOOLHIVE_TOKEN_EXCHANGE_CLIENT_SECRET_*
    env vars in a different sequence on every reconcile.
  • containerNeedsUpdate compares env vars with reflect.DeepEqual, which is
    order-sensitive on slices, so every reconcile saw a spurious diff and triggered a
    deployment update, which triggered the next reconcile, ad infinitum.
  • Fix: sort the returned []corev1.EnvVar by .Name before returning from both
    functions. Env var ordering has no semantic meaning in Kubernetes; the vMCP runtime
    looks up secrets by name via os.Getenv, not by slice position.

Type of change

  • Bug fix

Test plan

  • Unit tests (task test)

Two new tests added:

  • TestDiscoverExternalAuthConfigSecrets_DeterministicOrdering — table-driven, exercises
    both reverse-alphabetical and mixed workload orderings, asserts output is always sorted
  • TestDiscoverInlineExternalAuthConfigSecrets_DeterministicOrdering — exercises Go map
    non-determinism in Spec.OutgoingAuth.Backends, asserts output is always sorted

Both tests were written first (failing), then the sort was added (passing) — confirmed
the tests catch the bug before the fix.

Does this introduce a user-facing change?

Yes. VirtualMCPServer deployments using outgoingAuth.source: discovered with 4+
MCPExternalAuthConfigs will no longer enter a continuous update/restart loop.

Special notes for reviewers

buildCABundleVolumesForEntries also iterates typedWorkloads to produce ordered slices,
but volumes are not compared in any *NeedsUpdate check — so it does not contribute to
this loop and is out of scope for this fix.

The ConfigMap side (discoverExternalAuthConfigs) was also reviewed: it populates a
map[string]*BackendAuthStrategy which yaml.v3 serialises with sorted keys, so the
ConfigMap content and its checksum are already deterministic.

Generated with Claude Code

…scovered auth mode

Signed-off-by: lujunsan <luisjuncaldev@gmail.com>
@github-actions github-actions bot added the size/M Medium PR: 300-599 lines changed label Apr 13, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.21%. Comparing base (c2a8154) to head (2f95f64).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4783      +/-   ##
==========================================
+ Coverage   69.11%   69.21%   +0.09%     
==========================================
  Files         533      533              
  Lines       55251    55257       +6     
==========================================
+ Hits        38186    38244      +58     
+ Misses      14134    14069      -65     
- Partials     2931     2944      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread cmd/thv-operator/controllers/virtualmcpserver_deployment.go
Comment thread cmd/thv-operator/controllers/virtualmcpserver_deployment.go
@gastoncan
Copy link
Copy Markdown

Any possibility to check progress on this?
It is destabilizing my vMCP with continuous restarts.
Appreciate your effort :)

@github-actions github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Apr 16, 2026
@lujunsan lujunsan requested a review from jerm-dro April 16, 2026 13:12
@lujunsan lujunsan merged commit 3c5ac7e into main Apr 16, 2026
70 of 71 checks passed
@lujunsan lujunsan deleted the fix-vmcp-discovered-auth-env-var-ordering branch April 16, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Medium PR: 300-599 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants