[PLACEHOLDER — needs review] Fresh-tunnel: HTTPProxy Programmed/ConnectorMetadataProgrammed blocked ~3min on 'Downstream EnvoyPatchPolicy has no status yet'

> **⚠ Placeholder — unverified.** This issue was opened by Claude from CLI-side observations in a debugging session. **@drewr — please verify against the operator code before triaging.** I do not have direct read access to `EnvoyPatchPolicy` resources or to the controller logs, so I'm reporting symptoms + a controller-emitted reason string rather than confirmed root cause. Adjust scope / close as not-a-bug if my read is wrong.

> **Scope (narrowed 2026-06-07):** This issue is now scoped specifically to the **fresh-tunnel** case — first creation of an HTTPProxy + ConnectorAdvertisement, where the controller's `Programmed` and `ConnectorMetadataProgrammed` conditions sit at False while the downstream `EnvoyPatchPolicy` has no status yet. The originally-related symptom of "tunnel resumed with all conditions Ready at 0.1s but proxy URL serves 5xx for ~140s" is **a different problem** (iroh peer-handshake latency between the edge and a freshly-resumed agent, possibly compounded by Envoy outlier-detection ejection windows) and will be filed separately. See [the clarification comment below](#issuecomment-4644172756) for the data that motivated the split.

## Symptom

A successful tunnel-listen run today produced the following progress timeline (each line is the first moment that condition transitioned to `True`):

```
✓ connector ready (3.0s)
✓ tunnel accepted (4.6s)
✓ TLS certificate issued (4.6s)
✓ iroh DNS published (4.6s)
… route programmed still pending after 30s: Downstream EnvoyPatchPolicy has no status yet
… envoy metadata propagated still pending after 30s: Downstream EnvoyPatchPolicy has no status yet
✓ route programmed (199.1s)
✓ envoy metadata propagated (199.1s)
```

Everything cleared in under 5 seconds *except* `HTTPProxy.status.conditions[Programmed]` and `HTTPProxy.status.conditions[ConnectorMetadataProgrammed]`, both of which stayed False for ~194 seconds with the verbatim reason/message:

```
reason:  (whatever the controller is emitting for "downstream not ready")
message: Downstream EnvoyPatchPolicy has no status yet
```

Then both flipped True simultaneously. The end-user impact today was a ~3-minute setup time for a feature that's otherwise sub-5-second across all other conditions.

## What this likely means (best-effort interpretation — please verify)

The HTTPProxy controller appears to gate `Programmed` and `ConnectorMetadataProgrammed` on the readiness of a `EnvoyPatchPolicy` (Envoy Gateway resource) that it creates as a downstream artifact. While the EnvoyPatchPolicy has no `status` populated, the HTTPProxy controller surfaces \"Downstream EnvoyPatchPolicy has no status yet\" and waits.

The 3-minute gap suggests one of:

1. **Envoy Gateway reconciler is slow / running on a long interval.** If the EnvoyPatchPolicy controller's resync period is ~3 minutes (or if its work queue is backed up), this is what you'd see — every new EnvoyPatchPolicy waits a full reconcile cycle before status appears.
2. **Status is only written on a *trigger* the HTTPProxy controller doesn't watch.** If the HTTPProxy controller doesn't have a watcher on `EnvoyPatchPolicy` and only re-reconciles on its own timer (or on its own resource changes), it might not pick up the EnvoyPatchPolicy's status promptly even after the downstream controller writes it.
3. **Cold-start gateway provisioning.** If `EnvoyPatchPolicy` is the first one in this gateway / data plane, there may be a one-time provisioning step that's slow. Subsequent tunnels would be fast.
4. **Some genuine wait-on-actual-Envoy-rollout latency.** Envoy Gateway pushes configuration via xDS; the policy's status may legitimately wait for a downstream Envoy to ack the new config.

The 199.1s precise value suggests something more deterministic than \"random reconcile backlog\" — likely a resync interval or a programming-rollout SLA.

## Concrete instance from today's session

- HTTPProxy: `tunnel-gchhg` (namespace `default`, project `drewr-y4nd1b`)
- Hostname: `career-replay-kyxnz.datumproxy.net`
- Connector: `datum-connect-mhxj5` (endpoint id `c1469d2f5d1547edbdc2dcbd007d97f92be25e1efbbd9bb728a1d160797fd2b8`)
- Tunnel created at `2026-06-07T18:30:30Z`-ish (local CLI clock)
- `Programmed` / `ConnectorMetadataProgrammed` transitioned True at ~`2026-06-07T18:33:50Z`

This was on the staging environment. The relevant `EnvoyPatchPolicy` (or whatever the HTTPProxy controller derived) should still be on cluster if not GC'd.

## How to diagnose this in the future

Things that would have made today's investigation faster:

1. **Make the controller's wait reason point at the resource.** The message `Downstream EnvoyPatchPolicy has no status yet` doesn't include the name or namespace of the EnvoyPatchPolicy. If the message named it (`EnvoyPatchPolicy default/tunnel-gchhg-...`) the user / next operator could `kubectl describe` it directly without grepping the controller source.
2. **Watch the EnvoyPatchPolicy from the HTTPProxy controller** if it isn't already, so a downstream status change triggers an immediate re-reconcile rather than waiting for the HTTPProxy controller's own interval.
3. **Surface the downstream controller's progress.** If the EnvoyPatchPolicy itself has interim conditions (e.g. `Translated`, `Programmed`, `Acked`), the HTTPProxy could roll the most recent transition message up into its own status message. \"Waiting for downstream\" tells a user nothing; \"Waiting for downstream to reach Acked (currently Translated, last update 12s ago)\" tells them what's happening.
4. **Log every reconcile decision with the reason at INFO.** A controller log line like `tunnel-gchhg: deferring Programmed: downstream EnvoyPatchPolicy default/x has no status (resourceVersion=N, gen=M)` would have given a clear timeline to correlate against client-side timing.

## Why it caught us

CLI-side, [datum-cloud/app@90a7ab3](https://github.com/datum-cloud/app/commit/90a7ab3) added a progress streaming view that surfaces controller condition reasons, and that's how this latency became visible to a user for the first time. Without that streaming the user just sees `Setting up tunnel...` and then `Tunnel ready after 199 sec` — annoying but not actionable. With it, the controller's own \"Downstream EnvoyPatchPolicy has no status yet\" message is the actionable signal that points here.

## Cross-references

- App-side debugging session: datum-cloud/app#130
- Progress streaming commit that made this latency visible: [datum-cloud/app@90a7ab3](https://github.com/datum-cloud/app/commit/90a7ab3)
- 30s-stuck warning that printed the reason verbatim: see PR for `await_tunnel_progress` in `cli/src/main.rs`

---

**Action for @drewr:** verify the symptom against the controller behavior, dig into the actual EnvoyPatchPolicy reconcile path (is it slow, or is the HTTPProxy controller just slow to *notice*?), and retitle/scope this issue once root cause is known. The CLI-side diagnostic improvements may also be a separate small issue on the operator side (item 1 in the \"How to diagnose\" list — naming the resource in the message — is cheap and high-leverage).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PLACEHOLDER — needs review] Fresh-tunnel: HTTPProxy Programmed/ConnectorMetadataProgrammed blocked ~3min on 'Downstream EnvoyPatchPolicy has no status yet' #175

Symptom

What this likely means (best-effort interpretation — please verify)

Concrete instance from today's session

How to diagnose this in the future

Why it caught us

Cross-references

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[PLACEHOLDER — needs review] Fresh-tunnel: HTTPProxy Programmed/ConnectorMetadataProgrammed blocked ~3min on 'Downstream EnvoyPatchPolicy has no status yet' #175

Description

Symptom

What this likely means (best-effort interpretation — please verify)

Concrete instance from today's session

How to diagnose this in the future

Why it caught us

Cross-references

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions