fix(connector): propagate liveness to the edge so online connectors aren't treated as offline#208
Merged
Merged
Conversation
Connector Ready + connectionDetails are computed authoritatively in the Project control plane and stored in the Connector status subresource. The edge extension server runs on a member cluster and reads connectors from the local cache to classify a tunnel online/offline. The replicator's downstream write lands on the Karmada hub resourcetemplate, and Karmada propagates only spec + metadata (labels/annotations) to members — NOT the status subresource. So the member-cluster Connector carries spec only, every connector reads as offline on the data plane, and online promotion can never fire. Karmada has no native push-status-down mechanism, but it does propagate template annotations to members. Carry liveness down through an annotation instead of status: - Replicator stamps networking.datumapis.com/connector-liveness onto the downstream Connector metadata (a compact JSON snapshot: ready + nodeID), written via the normal CreateOrUpdate Update (not the status subresource). - Extension server reads the annotation first and falls back to status when it is absent/unparseable (single-cluster and pre-rollout objects). The annotation carries only status-derived fields the ext-server consumes: ready (from the Ready condition) and nodeID (from status.connectionDetails.publicKey.id). TargetHost/TargetPort are NOT carried because they derive from the HTTPProxy backend endpoint, not connector status. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
57953fa to
1d645ee
Compare
The connector liveness annotation previously carried a reduced
{ready, nodeID} snapshot, hard-coding the assumption that the tunnel
node ID is the PublicKey id. Carry the upstream Connector's full
status.connectionDetails (type discriminator + type-specific block)
alongside ready instead, so additional connection types can be
supported without changing the annotation's JSON schema.
The extension server now derives the tunnel node ID via a type-aware
ConnectorConnectionDetails.TunnelNodeID() that switches on the
connection type, with an empty-string default for unknown types, and
keeps the existing status fallback when the annotation is absent or
unparseable.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The annotation still carries the full connectionDetails so the data is available for future connection types without a schema change. But the extension server reads the only field it uses today — the PublicKey node id — directly, rather than behind a type-switch helper. Pre-abstracting node-id extraction before we know what future connection types need is premature, so the ConnectorConnectionDetails.TunnelNodeID() helper is removed in favour of a direct, nil-guarded field read. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Karmada propagates a resource template's spec + metadata to member clusters but not the status subresource, so a member-cluster Connector arrives without the Ready condition or connectionDetails the edge extension server needs — and every connector reads as offline even when its tunnel is up. Replace the bespoke connector-liveness annotation with a generic "mirror full upstream status into an annotation" mechanism: - Replicator copies the upstream resource's .status verbatim into a resource-agnostic UpstreamStatusAnnotation, gated by an opt-in per-type flag (mirrorStatusToAnnotation) — only Connector is flagged today. - Drop the bespoke ConnectorLiveness type and its deepcopy. - Extension server derives (online, nodeID) from a single ConnectorStatus helper, shared between the annotation (unmarshalled status) and the live status fallback, so the duplicate ready-derivation goes away. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
No resource sets mirrorStatusDownstream now that the Connector mirrors its upstream status into an annotation instead. The downstream status-subresource mirror (mirrorUpstreamStatusToDownstream) never reached member clusters across a Karmada hub→member boundary — the very bug this PR works around — so the field, the method, its dispatch in the reconcile path, and the now-unused status-mirror error metric are all dead code. Remove them and the stale comments that referenced them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mattdjenkinson
approved these changes
Jun 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A customer's Connector that is actually online and healthy is treated as offline at the edge, so their users hit a tunnel-offline error even though the tunnel is up.
Connector liveness — the
Readycondition andconnectionDetails— is computed authoritatively in the Project control plane and stored in the Connector's status. But the data-plane extension server runs on a member cluster and reads connectors from its local cache. Karmada propagates a resource template's spec and metadata to members, but not the status subresource — so the member-cluster Connector arrives with spec only: noReady, noconnectionDetails. Every connector reads as offline, and a healthy tunnel can never be promoted online.What changed
A generic mirror-upstream-status-into-an-annotation mechanism:
.statusverbatim into a resource-agnostic annotation (networking.datumapis.com/upstream-status) on the downstream object — metadata Karmada does propagate to members. It has no per-type field knowledge and is opt-in per resource type via amirrorStatusToAnnotationflag; only the Connector is enabled today.(online, nodeID)from a connector status —onlinefrom theReadycondition,nodeIDfromconnectionDetails.publicKey.id— so the annotation and live-status paths share identical classification.mirrorUpstreamStatusToDownstream), which wrote status to the Karmada hub via the status subresource — Karmada never propagated that to members, so it was ineffective. The annotation replaces it.Notes
Readycondition +connectionDetails.publicKey.id. Tunnel host/port come from the HTTPProxy backend, not connector status, so they aren't carried.Leaseno longer needs propagating, sinceReadyrides the annotation.gofmt/go build/go vetclean; controller + extension-server tests pass.golangci-lintcould not run in this environment (built with go1.25, project targets go1.26.4) — flagged for CI.🤖 Generated with Claude Code