Skip to content

fix(connector): propagate liveness to the edge so online connectors aren't treated as offline#208

Merged
scotwells merged 5 commits into
mainfrom
fix/connector-liveness-annotation
Jun 19, 2026
Merged

fix(connector): propagate liveness to the edge so online connectors aren't treated as offline#208
scotwells merged 5 commits into
mainfrom
fix/connector-liveness-annotation

Conversation

@scotwells

@scotwells scotwells commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Why

A customer's Connector that is actually online and healthy is treated as offline at the edge, so their users hit a tunnel-offline error even though the tunnel is up.

Connector liveness — the Ready condition and connectionDetails — is computed authoritatively in the Project control plane and stored in the Connector's status. But the data-plane extension server runs on a member cluster and reads connectors from its local cache. Karmada propagates a resource template's spec and metadata to members, but not the status subresource — so the member-cluster Connector arrives with spec only: no Ready, no connectionDetails. Every connector reads as offline, and a healthy tunnel can never be promoted online.

What changed

A generic mirror-upstream-status-into-an-annotation mechanism:

  • Replicator copies the upstream resource's full .status verbatim into a resource-agnostic annotation (networking.datumapis.com/upstream-status) on the downstream object — metadata Karmada does propagate to members. It has no per-type field knowledge and is opt-in per resource type via a mirrorStatusToAnnotation flag; only the Connector is enabled today.
  • Extension server reads the status from that annotation when present (parsed back into the connector status), otherwise falls back to the object's live status. A single helper derives (online, nodeID) from a connector status — online from the Ready condition, nodeID from connectionDetails.publicKey.id — so the annotation and live-status paths share identical classification.
  • Removes the old downstream status-mirror path (mirrorUpstreamStatusToDownstream), which wrote status to the Karmada hub via the status subresource — Karmada never propagated that to members, so it was ineffective. The annotation replaces it.

Notes

  • Karmada has no native push-status-to-members mechanism, so an annotation is the propagation path.
  • Mirroring the full status verbatim (rather than a bespoke reduced shape) keeps the mechanism resource-agnostic — new status fields or connection types need no annotation-schema change; the consumer reads the fields it uses. Today that's the Ready condition + connectionDetails.publicKey.id. Tunnel host/port come from the HTTPProxy backend, not connector status, so they aren't carried.
  • Opt-in per type, so behavior for other replicated resources is unchanged.
  • The heartbeat Lease no longer needs propagating, since Ready rides the annotation.
  • Independent of fix(extension-server): deterministic tunnel-offline 503 on the user path for offline connectors #207 (deterministic tunnel-offline response on the data path) and complements the branded edge error page (feat(extension-server): branded data-plane error pages #205).
  • gofmt / go build / go vet clean; controller + extension-server tests pass. golangci-lint could not run in this environment (built with go1.25, project targets go1.26.4) — flagged for CI.

🤖 Generated with Claude Code

@scotwells scotwells changed the title fix(connector): propagate liveness Project→member via annotation (Gap A) fix(connector): propagate liveness to the edge so online connectors aren't treated as offline Jun 19, 2026
Connector Ready + connectionDetails are computed authoritatively in the
Project control plane and stored in the Connector status subresource. The
edge extension server runs on a member cluster and reads connectors from the
local cache to classify a tunnel online/offline. The replicator's downstream
write lands on the Karmada hub resourcetemplate, and Karmada propagates only
spec + metadata (labels/annotations) to members — NOT the status subresource.
So the member-cluster Connector carries spec only, every connector reads as
offline on the data plane, and online promotion can never fire.

Karmada has no native push-status-down mechanism, but it does propagate
template annotations to members. Carry liveness down through an annotation
instead of status:

- Replicator stamps networking.datumapis.com/connector-liveness onto the
  downstream Connector metadata (a compact JSON snapshot: ready + nodeID),
  written via the normal CreateOrUpdate Update (not the status subresource).
- Extension server reads the annotation first and falls back to status when
  it is absent/unparseable (single-cluster and pre-rollout objects).

The annotation carries only status-derived fields the ext-server consumes:
ready (from the Ready condition) and nodeID (from
status.connectionDetails.publicKey.id). TargetHost/TargetPort are NOT carried
because they derive from the HTTPProxy backend endpoint, not connector status.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scotwells and others added 4 commits June 19, 2026 10:42
The connector liveness annotation previously carried a reduced
{ready, nodeID} snapshot, hard-coding the assumption that the tunnel
node ID is the PublicKey id. Carry the upstream Connector's full
status.connectionDetails (type discriminator + type-specific block)
alongside ready instead, so additional connection types can be
supported without changing the annotation's JSON schema.

The extension server now derives the tunnel node ID via a type-aware
ConnectorConnectionDetails.TunnelNodeID() that switches on the
connection type, with an empty-string default for unknown types, and
keeps the existing status fallback when the annotation is absent or
unparseable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The annotation still carries the full connectionDetails so the data is
available for future connection types without a schema change. But the
extension server reads the only field it uses today — the PublicKey node
id — directly, rather than behind a type-switch helper. Pre-abstracting
node-id extraction before we know what future connection types need is
premature, so the ConnectorConnectionDetails.TunnelNodeID() helper is
removed in favour of a direct, nil-guarded field read.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Karmada propagates a resource template's spec + metadata to member
clusters but not the status subresource, so a member-cluster Connector
arrives without the Ready condition or connectionDetails the edge
extension server needs — and every connector reads as offline even when
its tunnel is up.

Replace the bespoke connector-liveness annotation with a generic
"mirror full upstream status into an annotation" mechanism:

- Replicator copies the upstream resource's .status verbatim into a
  resource-agnostic UpstreamStatusAnnotation, gated by an opt-in per-type
  flag (mirrorStatusToAnnotation) — only Connector is flagged today.
- Drop the bespoke ConnectorLiveness type and its deepcopy.
- Extension server derives (online, nodeID) from a single ConnectorStatus
  helper, shared between the annotation (unmarshalled status) and the
  live status fallback, so the duplicate ready-derivation goes away.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
No resource sets mirrorStatusDownstream now that the Connector mirrors
its upstream status into an annotation instead. The downstream
status-subresource mirror (mirrorUpstreamStatusToDownstream) never
reached member clusters across a Karmada hub→member boundary — the very
bug this PR works around — so the field, the method, its dispatch in the
reconcile path, and the now-unused status-mirror error metric are all
dead code. Remove them and the stale comments that referenced them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@scotwells scotwells marked this pull request as ready for review June 19, 2026 16:07
@scotwells scotwells merged commit a4214b2 into main Jun 19, 2026
10 checks passed
@scotwells scotwells deleted the fix/connector-liveness-annotation branch June 19, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants