feat: replace ingress-nginx smoke cluster with Gateway API and Envoy Gateway by avirajkhare00 · Pull Request #3386 · apache/iggy

avirajkhare00 · 2026-06-01T06:51:19Z

Which issue does this PR address?

Rationale

Ingress NGINX was officially retired in March 2026. Our Helm smoke/CI
cluster setup still pulled the retired controller, which gets no further
security or bug fixes.

What changed?

scripts/ci/setup-helm-smoke-cluster.sh

Removed nginx-specific kind config (ingress-ready label, hostPort
mappings, admission-webhook polling helpers).
Install Envoy Gateway via Helm first, then upgrade Gateway API CRDs to
v1.5.0 with --server-side --force-conflicts. This ordering avoids the
safe-upgrade ValidatingAdmissionPolicy blocking EG's older bundled CRDs.
Creates an EnvoyProxy (NodePort), GatewayClass, and Gateway resource,
then waits for the Programmed condition.

scripts/ci/test-helm.sh

Removed HELM_SMOKE_INGRESS_CLASS; chart is now deployed without Ingress
objects.
After helm install, applies two HTTPRoute resources pointing at the chart
Services (server :3000, UI :3050).
Uses kubectl port-forward to expose the gateway on 127.0.0.1:8080,
which works on both Linux CI runners and macOS (unlike direct NodePort
access, which fails inside Docker Desktop's VM).

Local Execution

Passed
Pre-commit hooks ran

AI Usage

If AI tools were used, please answer:

Which tools? (e.g., GitHub Copilot, Claude, ChatGPT) -> Claude
Scope of usage? - fixed existing code
How did you verify the generated code works correctly? -> ran and verified all locally
Can you explain every line of the code if asked? -> YES

…voy Gateway

github-actions · 2026-06-01T06:51:29Z

Thanks for the PR. It is labeled S-waiting-on-review and queued for review.

Slash commands (own line, regular comment) move it around the queue:

/ready - back to S-waiting-on-review after addressing feedback
/request-review @user-or-team - request a reviewer

See CONTRIBUTING.md for details.

hubcio · 2026-06-01T10:06:12Z

+  kubectl port-forward "svc/${svc_name}" "${pf_port}:80" \
+    --namespace "$HELM_SMOKE_GATEWAY_NAMESPACE" \
+    >/dev/null 2>&1 &
+  HELM_SMOKE_GW_PF_PID=$!


get_gateway_base_url is invoked as gateway_url="$(get_gateway_base_url)" at line 441, so it runs in a command-substitution subshell. that means this HELM_SMOKE_GW_PF_PID=$! sets the var only inside the subshell - the parent's copy stays empty, so kill "${HELM_SMOKE_GW_PF_PID:-}" at line 477 runs kill "" and does nothing, and the kubectl port-forward orphans holding 127.0.0.1:8080. the comment at 270-271 about killing it regardless of exit path isn't true. on the ephemeral ci runner the orphan dies with the vm so the workflow still passes, but running this locally as a dev tool leaves a stale port-forward that breaks the next run's bind on :8080. separately there's no trap, so even with correct scoping the kill only fires on the success path - a failed curl/test aborts under set -e before reaching line 477. cleanest fix is to start the port-forward directly in smoke() so the pid lands in the parent, plus one trap ... EXIT that kills it and removes the temp values file.

hubcio · 2026-06-01T10:06:13Z

+# Install EG first so its bundled GW API CRDs land before the v1.5 safe-upgrade VAP.
+# We then upgrade those CRDs to the pinned version via server-side apply.
+echo "Installing Envoy Gateway ${HELM_SMOKE_ENVOY_GATEWAY_VERSION}..."
+helm upgrade --install eg \


envoy gateway v1.4.1 is pinned here while the gateway api crds get upgraded to v1.5.0 (line 138) on a v1.35.0 kind node (line 54). per the envoy gateway compat matrix, eg v1.4 targets gateway api v1.3.0 and k8s v1.30-v1.33, so all three are out of the supported matrix (no eg release through v1.6 lists k8s v1.35 or gw-api v1.5). it probably still reconciles since the core Gateway/HTTPRoute types are v1 GA and the v1.5 fields are additive, but it's unsupported, and if the controller can't satisfy a v1.5 default or policy the Gateway never reaches Programmed and the wait at line 184 burns the full 5m then fails. worth bumping eg to a release that lists these versions, or pinning gw-api/kind back to the v1.4 matrix. (the k8s v1.35 pin is carried over from the old nginx path; only eg + gw-api are new here.)

hubcio · 2026-06-01T10:06:13Z

+
+  svc_selector="gateway.envoyproxy.io/owning-gateway-name=${HELM_SMOKE_GATEWAY_NAME},gateway.envoyproxy.io/owning-gateway-namespace=${HELM_SMOKE_GATEWAY_NAMESPACE}"
+
+  svc_name="$(kubectl get svc \


single-shot lookup with no retry. the Gateway reaching Programmed doesn't guarantee the owning-gateway Service object exists yet, so a transient miss here fails the run. also on an empty result kubectl ... jsonpath='{.items[0]...}' exits non-zero and prints a raw array index out of bounds to stderr - the [ -z ] guard below does still catch it (the assignment doesn't abort under set -e in this command-sub form), but both the raw error and the friendly one end up printed. add 2>/dev/null to the lookup and wrap it in a short retry.

hubcio · 2026-06-01T10:06:13Z

+    type: Kubernetes
+    kubernetes:
+      envoyService:
+        type: NodePort


type: NodePort here is effectively unused. the new kind config (lines 82-87) has no extraPortMappings, and the smoke test reaches envoy only via kubectl port-forward svc/...:80, which tunnels through the apiserver to the ClusterIP regardless of service type - the allocated node port is never dialed. default ClusterIP is enough, so the provider override can be dropped.

hubcio · 2026-06-01T10:06:13Z

+  gateway_url="$(get_gateway_base_url)"
+
  kubectl version --client=true > "$HELM_SMOKE_REPORT_DIR/kubectl-version.txt"
  kubectl -n "$HELM_SMOKE_NAMESPACE" rollout status "deployment/$HELM_SMOKE_RELEASE" --timeout="$HELM_SMOKE_TIMEOUT"


these two rollout status calls are redundant - the helm upgrade --install --wait above already blocks until both deployments are available, and you return early if helm fails, so by here they're already rolled out and these return immediately. can drop them (they do still emit a line to the ci log if that's worth keeping).

hubcio · 2026-06-01T10:06:13Z

+    >/dev/null 2>&1 &
+  HELM_SMOKE_GW_PF_PID=$!
+
+  # Wait up to 15 s for the tunnel to accept connections.


the comment says up to 15 s, but the loop is 15 iterations of curl --max-time 2 plus sleep 1, so worst case is closer to 45s if the connection hangs to max-time each iter (~30s on the connect-timeout path). just the comment is off, the loop itself is fine.

hubcio · 2026-06-01T10:06:13Z

+get_gateway_base_url() {
+  local svc_selector
+  local svc_name
+  local pf_port=8080


pf_port=8080 is the one knob in this script without a ${VAR:-default} override - every other tunable is env-overridable. minor consistency thing; an env override like HELM_SMOKE_GATEWAY_PF_PORT would let someone work around a busy host loopback.

hubcio

a couple of findings that don't map to a line in this diff:

the helm smoke job didn't actually run on this PR. .github/config/components.yml gates the helm component on paths: helm/** only, and this PR only touches scripts/ci/*.sh, so detect-changes skipped the validate+smoke matrix - the green checks here are lint/shellcheck/license, none of them helm-named. so this whole rewrite (the envoy gateway install, the v1.5 crd apply, the port-forward, the curl loops) shipped without ci ever exercising it, and future edits to these smoke scripts will keep skipping too. adding scripts/ci/setup-helm-smoke-cluster.sh and scripts/ci/test-helm.sh to the helm component paths would gate them - rust-bench-dashboard already co-lists its scripts/dashboard/** the same way. this is the one i'd fix first, it's why everything else here went unexercised.

minor: HELM_SMOKE_GATEWAY_NAMESPACE and HELM_SMOKE_GATEWAY_NAME are defined with matching defaults in both scripts, and the HTTPRoute parentRef has to match the Gateway. overriding one of them in only one script silently breaks route attach - worth a note that they have to be set together.

ci(helm): replace ingress-nginx smoke cluster with Gateway API and En…

e5c88bd

…voy Gateway

github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label Jun 1, 2026

hubcio requested changes Jun 1, 2026

View reviewed changes

github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jun 1, 2026

hubcio requested changes Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replace ingress-nginx smoke cluster with Gateway API and Envoy Gateway#3386

feat: replace ingress-nginx smoke cluster with Gateway API and Envoy Gateway#3386
avirajkhare00 wants to merge 1 commit into
apache:masterfrom
avirajkhare00:master

avirajkhare00 commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

hubcio Jun 1, 2026

Uh oh!

hubcio Jun 1, 2026

Uh oh!

hubcio Jun 1, 2026

Uh oh!

hubcio Jun 1, 2026

Uh oh!

hubcio Jun 1, 2026

Uh oh!

hubcio Jun 1, 2026

Uh oh!

hubcio Jun 1, 2026

Uh oh!

hubcio left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		svc_selector="gateway.envoyproxy.io/owning-gateway-name=${HELM_SMOKE_GATEWAY_NAME},gateway.envoyproxy.io/owning-gateway-namespace=${HELM_SMOKE_GATEWAY_NAMESPACE}"

		svc_name="$(kubectl get svc \

Conversation

avirajkhare00 commented Jun 1, 2026

Which issue does this PR address?

Rationale

What changed?

Local Execution

AI Usage

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

hubcio Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

hubcio left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants