feat(gateway): URL-stable identity-based cluster naming for API-level upstreams#2137
Conversation
📝 WalkthroughSummaryThis PR implements identity-based cluster naming for API-level upstreams in the gateway controller, replacing URL-derived names with stable identity names. Cluster names now follow the pattern Problem SolvedPreviously, cluster names were derived from upstream URLs (e.g., SolutionThe new naming scheme makes cluster names stable for the API's lifetime. URL edits now result in endpoint updates on the same named cluster without connection disruption. Key Changes
Benefits
WalkthroughThis PR replaces URL-derived Envoy upstream cluster naming with a deterministic, API-identity-based scheme. A new Sequence Diagram(s)sequenceDiagram
participant GatewayController
participant xDSTranslator
participant RestAPITransformer
participant clusterkey
GatewayController->>xDSTranslator: translateAPIConfig(cfg)
xDSTranslator->>clusterkey: APILevelName(upstreamName, cfg.UUID)
clusterkey-->>xDSTranslator: deterministic cluster name
xDSTranslator-->>GatewayController: Envoy xDS snapshot with identity-named clusters
GatewayController->>RestAPITransformer: Transform(rdc)
RestAPITransformer->>clusterkey: APILevelName(upstreamName, rdc.Metadata.UUID)
clusterkey-->>RestAPITransformer: deterministic cluster name
RestAPITransformer-->>GatewayController: ClusterKey + EnvoyClusterName aligned
Suggested Reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain modules listed in go.work or their selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
534247e to
a99b916
Compare
|
@coderabbitai full review |
✅ Action performedFull review finished. |
5c3aed7 to
840cb13
Compare
|
@coderabbitai please do a full review of this PR , concrete suggestions, a risk analysis, and anything else you can surface. Please read the actual code, not just the description. Correctness & design
Risk analysis
Edge cases
Tests
|
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== clusterkey package ==="
fd -t f . pkg/utils/clusterkey --exec cat {}
echo ""
echo "=== translator.go - resolveUpstreamCluster and related ==="
rg -n "resolveUpstreamCluster|sanitizeClusterName|clusterkey|APILevel|APILevelName|ClusterKey|EnvoyClusterName" pkg/xds/translator.go -A 5 -B 2
echo ""
echo "=== restapi.go - addUpstreamCluster and related ==="
rg -n "addUpstreamCluster|sanitizeEnvoyClusterName|clusterkey|APILevel|APILevelName|ClusterKey|EnvoyClusterName" pkg/transform/restapi.go -A 5 -B 2
echo ""
echo "=== Full clusterkey file list ==="
fd -t f . pkg/utils/clusterkey
echo ""
echo "=== Full translator.go resolveUpstreamCluster function ==="
rg -n "func resolveUpstreamCluster" pkg/xds/translator.go -A 50
echo ""
echo "=== Full addUpstreamCluster function ==="
rg -n "func addUpstreamCluster" pkg/transform/restapi.go -A 80
echo ""
echo "=== Integration test feature file ==="
fd -t f "api-level-url-stable.feature" --exec cat {}
echo ""
echo "=== Translator test file - new test cases ==="
fd -t f . pkg/xds -e go -e _test.go --exec rg -n "DedupSameAPI|MainSandboxNever|TestResolveUpstreamCluster" {} -A 30
echo ""
echo "=== restapi transformer test - new test cases ==="
fd -t f . pkg/transform -e go --exec rg -n "APILevelDefaultCluster|APILevelURL|TestRestAPITransformer_APILevel" {} -A 30
echo ""
echo "=== cmd/controller/main.go cluster name comment ==="
rg -n "cluster" cmd/controller/main.go -B 2 -A 5
echo ""
echo "=== Check if any old sanitize references remain ==="
rg -rn "sanitizeClusterName|sanitizeEnvoyClusterName|cluster_http|cluster_https" --include="*.go"
echo ""
echo "=== Check ClusterKey and EnvoyClusterName in restapi.go ==="
rg -n "ClusterKey\|EnvoyClusterName" pkg/transform/restapi.go -B 3 -A 3
echo ""
echo "=== data-model.md cluster section ==="
fd -t f "data-model.md" --exec grep -n -A 20 -i "cluster.nam" {}Length of output: 2285 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== clusterkey package full content ==="
find . -path "*/utils/clusterkey*" -type f | xargs cat
echo ""
echo "=== Check all clusterkey usages ==="
rg -rn "clusterkey\." --include="*.go" -B 2 -A 3
echo ""
echo "=== Check APILevelName / APILevel usage ==="
rg -rn "APILevelName\|APILevel(" --include="*.go" -B 2 -A 3
echo ""
echo "=== clusterkey test file ==="
find . -path "*/utils/clusterkey*_test*" -o -path "*/clusterkey*test*" | xargs cat 2>/dev/null || true
echo ""
echo "=== translator.go full resolveUpstreamCluster ==="
awk '/func resolveUpstreamCluster/,/^func /' pkg/xds/translator.go 2>/dev/null | head -80
echo ""
echo "=== restapi.go full addUpstreamCluster ==="
awk '/func addUpstreamCluster/,/^func /' pkg/transform/restapi.go 2>/dev/null | head -100
echo ""
echo "=== RDC struct definitions: UpstreamCluster, ClusterKey, EnvoyClusterName ==="
rg -rn "ClusterKey\b\|EnvoyClusterName\b" --include="*.go" -B 2 -A 2
echo ""
echo "=== Check for any remaining references to old URL-derived naming ==="
rg -rn 'cluster_\|scheme.*cluster\|sanitize.*cluster\|cluster.*sanitize' --include="*.go" -B 2 -A 2
echo ""
echo "=== Full translator test for new cases ==="
rg -rn "TestResolveUpstreamCluster" --include="*.go" -A 60 | head -200
echo ""
echo "=== Full restapi transformer test for new cases ==="
rg -rn "TestRestAPITransformer_APILevel" --include="*.go" -A 60 | head -200
echo ""
echo "=== xds snapshot atomicity - how snapshot is built ==="
rg -rn "snapshot\|Snapshot\|SetSnapshot\|GenerateSnapshot" pkg/xds/translator.go -B 2 -A 5 | head -80
echo ""
echo "=== integration test feature file ==="
find . -name "api-level-url-stable.feature" | xargs cat 2>/dev/null
echo ""
echo "=== Check empty apiID handling ==="
rg -rn 'apiID\|api_id\|ApiId' pkg/utils/clusterkey --include="*.go" -B 1 -A 5 2>/dev/null || true
find . -path "*/clusterkey*" | xargs rg -n "empty\|\"\"" 2>/dev/null || trueLength of output: 8168 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== translator.go resolveUpstreamCluster ==="
awk '/func resolveUpstreamCluster/,/^func [a-zA-Z]/' pkg/xds/translator.go | head -100
echo ""
echo "=== translator.go clusterkey/APILevel imports and usage ==="
grep -n "clusterkey\|APILevel\|APILevelName\|ClusterKey\|EnvoyClusterName\|sanitize" pkg/xds/translator.go
echo ""
echo "=== restapi.go addUpstreamCluster ==="
awk '/func addUpstreamCluster/,/^func [a-zA-Z]/' pkg/transform/restapi.go | head -120
echo ""
echo "=== restapi.go clusterkey/APILevel imports and usage ==="
grep -n "clusterkey\|APILevel\|APILevelName\|ClusterKey\|EnvoyClusterName\|sanitize" pkg/transform/restapi.go
echo ""
echo "=== restapi.go UpstreamCluster struct / ClusterKey / EnvoyClusterName field definitions ==="
grep -n "ClusterKey\|EnvoyClusterName\|UpstreamCluster" pkg/transform/restapi.go | head -40
echo ""
echo "=== Check struct definitions globally ==="
grep -rn "ClusterKey\b\|EnvoyClusterName\b" --include="*.go" . 2>/dev/null || \
grep -rn "ClusterKey\|EnvoyClusterName" . -l | xargs grep -n "ClusterKey\|EnvoyClusterName" | head -60
echo ""
echo "=== Check struct definitions globally (no --include) ==="
grep -rn "ClusterKey\|EnvoyClusterName" . --include="*.go" 2>/dev/null | head -60 || \
find . -name "*.go" | xargs grep -n "ClusterKey\|EnvoyClusterName" 2>/dev/null | head -60
echo ""
echo "=== Translator tests for new cases ==="
find . -name "*translator*test*" -o -name "*test*translator*" | grep "\.go$" | xargs grep -n "TestResolveUpstreamCluster\|DedupSameAPI\|MainSandboxNever" 2>/dev/null | head -20
find . -name "*_test.go" -path "*/xds/*" | xargs grep -n "TestResolveUpstreamCluster\|DedupSameAPI\|MainSandbox" 2>/dev/null | head -5
echo ""
echo "=== Full translator test cases ==="
find . -name "*_test.go" -path "*/xds/*" | head -5 | xargs -I{} awk '/func TestResolveUpstreamCluster/,/^func [A-Z]/' {} 2>/dev/null | head -200
echo ""
echo "=== Full restapi transformer test cases ==="
find . -name "*_test.go" -path "*/transform/*" | head -5 | xargs -I{} awk '/func TestRestAPITransformer_APILevel/,/^func [A-Z]/' {} 2>/dev/null | head -200
echo ""
echo "=== Integration test ==="
find . -name "api-level-url-stable.feature" 2>/dev/null | xargs cat 2>/dev/null
echo ""
echo "=== Snapshot atomicity in translator ==="
grep -n "snapshot\|Snapshot\|SetSnapshot" pkg/xds/translator.go | head -30
echo ""
echo "=== Check old URL-derived naming remnants ==="
find . -name "*.go" | xargs grep -n "cluster_http\|cluster_https\|sanitizeCluster\|sanitizeEnvoy" 2>/dev/null | head -20
echo ""
echo "=== data-model.md cluster section ==="
find . -name "data-model.md" | xargs grep -n -A 25 -i "cluster.nam\|cluster name\|cluster-nam" 2>/dev/null | head -60Length of output: 3133 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Repo root structure ==="
ls -1
echo ""
echo "=== Find translator.go ==="
find . -name "translator.go" 2>/dev/null
echo ""
echo "=== Find restapi.go ==="
find . -name "restapi.go" 2>/dev/null
echo ""
echo "=== Find clusterkey files ==="
find . -name "*.go" | xargs grep -l "APILevel\|clusterkey" 2>/dev/null | head -20
echo ""
echo "=== Find feature file ==="
find . -name "*.feature" | xargs grep -l "url-stable\|url_stable\|cluster" 2>/dev/null | head -10
echo ""
echo "=== Find test files referencing APILevel or DedupSameAPI ==="
find . -name "*_test.go" | xargs grep -l "APILevel\|DedupSameAPI\|APILevelName\|url.stable\|URLStable" 2>/dev/null | head -10
echo ""
echo "=== Full top-level tree (2 levels) ==="
find . -maxdepth 2 -type d | sort | head -60Length of output: 3555 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== translator.go resolveUpstreamCluster ==="
awk '/func resolveUpstreamCluster/,/^func [a-zA-Z]/' ./gateway/gateway-controller/pkg/xds/translator.go | head -120
echo ""
echo "=== translator.go clusterkey/APILevel/sanitize references ==="
grep -n "clusterkey\|APILevel\|APILevelName\|ClusterKey\|EnvoyClusterName\|sanitize\|import" ./gateway/gateway-controller/pkg/xds/translator.go | head -40
echo ""
echo "=== restapi.go addUpstreamCluster ==="
awk '/func addUpstreamCluster/,/^func [a-zA-Z]/' ./gateway/gateway-controller/pkg/transform/restapi.go | head -120
echo ""
echo "=== restapi.go clusterkey/APILevel/sanitize/ClusterKey/EnvoyClusterName references ==="
grep -n "clusterkey\|APILevel\|APILevelName\|ClusterKey\|EnvoyClusterName\|sanitize" ./gateway/gateway-controller/pkg/transform/restapi.go | head -40
echo ""
echo "=== ClusterKey and EnvoyClusterName struct definitions ==="
find ./gateway -name "*.go" | xargs grep -n "ClusterKey\b\|EnvoyClusterName\b" 2>/dev/null | head -50
echo ""
echo "=== translator_test.go - new test functions ==="
awk '/func TestResolveUpstreamCluster/,/^func [A-Z]/' ./gateway/gateway-controller/pkg/xds/translator_test.go | head -200
echo ""
echo "=== restapi_test.go - new APILevel test functions ==="
awk '/func TestRestAPITransformer_APILevel/,/^}$/' ./gateway/gateway-controller/pkg/transform/restapi_test.go | head -200
echo ""
echo "=== Integration test feature file ==="
cat ./gateway/it/features/api-level-url-stable.feature
echo ""
echo "=== Snapshot atomicity in translator.go ==="
grep -n "snapshot\|Snapshot\|SetSnapshot\|GenerateSnapshot\|atomic" ./gateway/gateway-controller/pkg/xds/translator.go | head -30
echo ""
echo "=== Old sanitize / URL-derived naming remnants ==="
find ./gateway -name "*.go" | xargs grep -n "cluster_http\|cluster_https\|sanitizeCluster\|sanitizeEnvoy" 2>/dev/null | head -20
echo ""
echo "=== data-model.md cluster section ==="
find ./gateway -name "data-model.md" | xargs grep -n -A 25 -i "cluster.nam\|cluster name\|cluster-nam" 2>/dev/null | head -80
echo ""
echo "=== cmd/controller/main.go comment ==="
find . -path "*/cmd/controller/main.go" | xargs grep -n "cluster\|xds\|transform" 2>/dev/null | head -20Length of output: 25751
Correctness & Design1. Both builders produce identical namesVerified. clusterName := clusterkey.APILevelName(upstreamName, apiID)
clusterKey := clusterkey.APILevelName(upstreamName, rdc.Metadata.UUID)Both call the same helper with 2. Main vs sandbox never collideVerified. 3.
|
| API count | Approximate collision probability |
|---|---|
| 10,000 | ~2.7 × 10⁻¹² (negligible) |
| 1,000,000 | ~2.7 × 10⁻⁸ (negligible) |
| 10⁹ | ~2.7 × 10⁻² (non-trivial) |
For most production deployments this is fine. However, the consequence of a collision is silent cross-wiring (two distinct APIs share one Envoy cluster), not fail-closed. This is the worst kind of failure mode for an API gateway — requests for API-A are silently routed to the backend of API-B.
Concrete suggestion: Increase the truncation to 12 bytes (96 bits, 24 hex chars). This pushes the collision probability to n² / 2⁹⁷, which is effectively impossible even at 10⁹ APIs. The known-answer test vectors would need updating, but the change is one-line in clusterkey.go:
// Before
return hex.EncodeToString(sum[:8])
// After (96-bit truncation)
return hex.EncodeToString(sum[:12])The cluster name grows from main_f9811b73ac5d1a8d (21 chars) to main_f9811b73ac5d1a8d3c2b1a09 (29 chars), well within Envoy's name limits.
R2 — Empty apiID cross-wiring
APILevel("") returns e3b0c44298fc1c14 (SHA-256 of the empty string, documented in the test). Every API with an empty apiID resolves to main_e3b0c44298fc1c14 / sandbox_e3b0c44298fc1c14, silently sharing a cluster. The test documents that emptiness is enforced upstream at deploy time, but there is no defensive guard in APILevel or APILevelName itself. If the upstream validation is bypassed or misconfigured, the failure mode is again silent cross-wiring.
Concrete suggestion: Add a guard at the top of APILevel:
func APILevel(apiID string) string {
if apiID == "" {
// Caller contract violation: enforce non-emptiness here as a last resort.
panic("clusterkey: apiID must not be empty")
}
sum := sha256.Sum256([]byte(apiID))
return hex.EncodeToString(sum[:8])
}A panic here is appropriate: an empty apiID is a programming error (not a runtime condition), and silently producing a shared cluster is more dangerous than a crash.
R3 — First-redeploy migration window
On the next configuration push after upgrading, existing clusters are renamed from cluster_http_<host> to main_<hash>. CDS and RDS are separate xDS watches: in the worst case, Envoy ACKs the new CDS snapshot (which removes the old cluster name and adds the new one) before the RDS update arrives with the repointed routes. During that gap, active routes still reference cluster_http_<host>, which no longer exists → 503 NoRoute.
This hazard is inherent to the xDS protocol and is not new to this PR. After the first redeploy the identity-stable naming eliminates this window entirely for URL edits, which is the whole point. However, the migration is worth calling out explicitly in operator runbooks, not just in PR description prose.
R4 — Observability: stat label loses hostname
The upstream_cluster Envoy stat changes from cluster_http_api.example.com_8080 to main_f9811b73ac5d1a8d. Dashboards keyed on the human-readable host lose their correlation to the actual backend. The PR description calls this out, but there is no compensating mechanism in the xDS config (e.g., cluster metadata or typed per-filter config) that would let Prometheus recording rules or Envoy access-log formatters recover the original hostname.
Concrete suggestion: Populate the cluster's metadata.filter_metadata["envoy.lb"] or a custom namespace with api_id, api_name, and env fields. This costs nothing at runtime and lets operators add metric labels without re-parsing the hash.
R5 — Cluster count increase at scale
Two APIs sharing a backend URL now each own their cluster. For a deployment with many APIs sharing a small number of backends (common in fan-out API gateway topologies), this multiplies the cluster count by a factor proportional to the number of APIs per shared backend. Envoy itself has no hard cluster limit but memory and convergence time grow. Worth monitoring in large multi-tenant deployments.
Edge Cases
E1 — API with only sandbox upstream (no main)
No test covers the case where an API is deployed with a sandbox vhost but no main upstream URL. The transformer path should skip registering a main_<hash> cluster, but any route on the main vhost that still references ClusterKey: "main_<hash>" would point at a cluster not in UpstreamClusters → policy engine 503 NoRoute. This case needs a unit test.
E2 — Main and sandbox pointing at the same URL
Correctly handled. The new naming gives them distinct clusters (main_<hash> ≠ sandbox_<hash>). The integration test does not exercise this, but it is a latent-collision fix that the unit tests cover.
Test Coverage Assessment
| Area | Coverage | Gap |
|---|---|---|
| Naming algorithm | TestAPILevel with known-answer vectors and TestAPILevelName — strong |
None |
| Both builders agree | Structural (shared helper) | No explicit cross-module assertion test |
| URL stability | TestRestAPITransformer_APILevelURLStableAcrossURLEdit |
Translator-side equivalent not shown |
| Old names absent | Integration test (cluster_http_ / cluster_https_ absence checked) |
Translator unit test does not assert absence of old name |
| DefaultCluster matches real cluster | TestRestAPITransformer_APILevelDefaultClusterMatchesRealCluster |
Good |
| Main/sandbox never collide | TestResolveUpstreamCluster_MainSandboxNeverCollide exists |
Body not verified — confirm it asserts prefix-only difference |
| Empty apiID | Documented in TestAPILevel ("empty input is deterministic") |
No test asserting deploy-time enforcement rejects it |
| Only-sandbox API | Not present | Missing |
| Integration: host change | api-level-url-stable.feature — captures cluster names before/after, asserts unchanged |
Strong |
Suggested additional test:
func TestAPILevel_OnlySandboxAPI(t *testing.T) {
// API deployed with sandbox upstream only; no main upstream registered.
// The main_<hash> cluster must NOT appear in UpstreamClusters.
transformer := NewRestAPITransformer(...)
cfg := makeRestAPIWithSandboxOnly(...)
rdc, err := transformer.Transform(cfg)
require.NoError(t, err)
expectedMain := "main_" + clusterkey.APILevel(cfg.UUID)
_, exists := rdc.UpstreamClusters[expectedMain]
assert.False(t, exists, "main cluster must not be registered when no main upstream is configured")
}Summary of Actionable Items
| # | Severity | Item |
|---|---|---|
| 1 | Medium | Increase hash truncation from 8 to 12 bytes to eliminate collision risk at scale |
| 2 | Medium | Add empty-apiID guard (panic or error) in clusterkey.APILevel |
| 3 | Low | Add cluster metadata fields (api_id, env) for observability continuity |
| 4 | Low | Add unit test: API-with-only-sandbox produces no main_<hash> cluster |
| 5 | Low | Add translator-side unit test asserting old cluster_http_ / cluster_https_ names are absent |
| 6 | Informational | Document the first-redeploy 503 window in operator runbooks |
The core design — identity in, URL out, one shared helper for both builders, ClusterKey == EnvoyClusterName — is sound and correctly implemented. The main residual risk is the 64-bit collision producing silent cross-wiring and the unguarded empty-apiID case.
840cb13 to
42cbb91
Compare
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== clusterkey.go - hash truncation ==="
find ./gateway -path "*/utils/clusterkey*" -name "*.go" | xargs cat
echo ""
echo "=== clusterkey_test.go - known-answer vectors ==="
find ./gateway -path "*/utils/clusterkey*" -name "*_test.go" | xargs cat
echo ""
echo "=== translator_test.go - TestResolveUpstreamCluster_NameNotURLDerived ==="
find ./gateway -name "*_test.go" -path "*/xds/*" | xargs grep -n "TestResolveUpstreamCluster_NameNotURLDerived\|NameNotURLDerived" -A 30 2>/dev/null || true
echo ""
echo "=== restapi_test.go - TestRestAPITransformer_APILevelMainOnlyHasNoSandboxCluster ==="
find ./gateway -name "*_test.go" -path "*/transform/*" | xargs grep -n "MainOnlyHasNoSandboxCluster\|APILevelMainOnly" -A 40 2>/dev/null || true
echo ""
echo "=== Confirm fragment length in clusterkey.go ==="
grep -n "sum\[:" $(find ./gateway -path "*/utils/clusterkey*" -name "*.go" ! -name "*_test.go") 2>/dev/null || true
echo ""
echo "=== Any remaining 8-byte / sum[:8] references ==="
find ./gateway -name "*.go" | xargs grep -n "sum\[:8\]\|sum\[:12\]" 2>/dev/null || trueLength of output: 13837
Item 1 — 96-bit truncation ✅
Item 4 — Main-only API, no sandbox cluster/route ✅
The direction is correct — Main mandatory, Sandbox optional — and the test precisely targets the 503 NoRoute failure mode. Item 5 — Old URL-derived name absent ✅
This is a complete regression lock. Item 2 — Empty
|
| # | Item | Status |
|---|---|---|
| 1 | 96-bit truncation | ✅ Resolved |
| 2 | Empty-apiID guard | ✅ Accepted as-is |
| 3 | Cluster metadata for observability | Deferred |
| 4 | Main-only no-sandbox test | ✅ Resolved |
| 5 | Old name absence test | ✅ Resolved |
| 6 | First-redeploy runbook note | Deferred |
Items 3 and 6 are noted as follow-up work. No open blockers in the addressed items.
… upstreams Name API-level Envoy clusters from sha256(apiID) with a main_/sandbox_ prefix instead of the backend URL, so a URL edit updates endpoints on the same named cluster instead of renaming it. Both xDS builders share one clusterkey helper.
42cbb91 to
9b9ee98
Compare
…on cases Two new api-level-url-stable scenarios: main and sandbox on the same backend host get separate identity-named clusters, and two APIs sharing a backend host each route under their own cluster. Both fail under the old URL-derived naming.
|
@coderabbitai please do a full review of this PR in its current state (head has changed since your last pass, it now also includes two new integration scenarios in api-level-url-stable.feature and minor doc fixes). In particular, review all of the code comments across the PR for verbosity and redundancy. Several doc/inline comments (for example the doc comment on APILevel in pkg/utils/clusterkey/clusterkey.go, which is about 13 lines over a 2-line function) restate design rationale that is already in the PR description. Compared to the commenting conventions in neighbouring gateway-controller files, are any comments too long or redundant? Where you find some, suggest a concise replacement. Also please cover, as a normal full review:
Base all style judgments on the repo's actual conventions, not a generic standard. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== clusterkey.go full content ==="
find ./gateway -path "*/utils/clusterkey*" -name "*.go" ! -name "*_test.go" | xargs cat
echo ""
echo "=== clusterkey_test.go full content ==="
find ./gateway -path "*/utils/clusterkey*" -name "*_test.go" | xargs cat
echo ""
echo "=== api-level-url-stable.feature full content ==="
find . -name "api-level-url-stable.feature" | xargs cat
echo ""
echo "=== Commenting conventions in neighboring files (pkg/utils) ==="
find ./gateway -path "*/utils/*.go" ! -name "*_test.go" ! -path "*/clusterkey*" | head -5 | xargs -I{} sh -c 'echo "--- {} ---"; head -80 {}'
echo ""
echo "=== Commenting conventions in pkg/xds/translator.go (nearby exported funcs) ==="
grep -n "^// \|^func " ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== Commenting conventions in pkg/transform/restapi.go (nearby exported funcs) ==="
grep -n "^// \|^func " ./gateway/gateway-controller/pkg/transform/restapi.go | head -60
echo ""
echo "=== translator.go resolveUpstreamCluster full ==="
awk '/func resolveUpstreamCluster/,/^func [a-zA-Z]/' ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== translator.go inline comments around resolveUpstreamCluster ==="
grep -n -B2 -A20 "func resolveUpstreamCluster" ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== restapi.go addUpstreamCluster full with comments ==="
grep -n -B2 -A80 "func addUpstreamCluster" ./gateway/gateway-controller/pkg/transform/restapi.go | head -100
echo ""
echo "=== cmd/controller/main.go comment about cluster naming ==="
find . -path "*/cmd/controller/main.go" | xargs grep -n -B5 -A10 "cluster\|xds\|transform\|sync" 2>/dev/null | head -60
echo ""
echo "=== translator_test.go - all new test cases full ==="
find ./gateway -name "*_test.go" -path "*/xds/*" | xargs grep -n "TestResolveUpstreamCluster" -A 50 2>/dev/null | head -300
echo ""
echo "=== restapi_test.go - all APILevel test cases full ==="
find ./gateway -name "*_test.go" -path "*/transform/*" | xargs grep -n "TestRestAPITransformer_APILevel" -A 60 2>/dev/null | head -300
echo ""
echo "=== data-model.md cluster section ==="
find . -name "data-model.md" | xargs grep -n -B5 -A40 -i "cluster" 2>/dev/null | head -120
echo ""
echo "=== Check PR diff summary via GitHub CLI ==="
gh pr view 2137 --repo wso2/api-platform --json files --jq '.files[].path' 2>/dev/null || echo "gh CLI not available"Length of output: 46691
Comment verbosity and redundancyThe repo convention in
|
| Area | Finding |
|---|---|
| Step-definition fidelity | I capture the Envoy cluster names prefixed "..." and should be unchanged are new step definitions not visible in this diff. If "unchanged" is implemented as simple string containment rather than exact set equality, the stability assertions in Scenarios 1 and 2 are weaker than intended. Worth verifying the step implementation uses set equality. |
| Independent-cluster deletion | No test asserts that deleting one of two APIs sharing a backend host leaves the other API's traffic unaffected. This is the key operational property of identity naming at scale. |
| Known-answer test coverage | clusterkey_test.go pins only two inputs ("api-1", "test-api"). A third vector for the empty-string SHA-256 is present but the more useful addition would be a vector for a realistic UUID-format apiID to confirm the function composes correctly with real inputs. |
Summary
The core logic is correct and the 96-bit truncation, shared helper, and ClusterKey == EnvoyClusterName constraints are all verified. The actionable items are:
- Shorten the
APILeveldoc comment to 3 lines (most impactful, clearly over the repo norm). - Shorten the
resolveUpstreamClusterblock comment to 3 lines. - Shorten the package comment to one line.
- Trim the
TestAPILevelfunction comment to remove the design-rationale sentence. - Scenario 5: Add an assertion for two distinct
main_<hash>cluster names and a post-delete routing check for API-A. - Verify the
should be unchangedstep definition uses exact set equality.
|
@coderabbitai addressed your review in d79e445:
Two points needed no code change: the "should be unchanged" step already uses exact set equality (strings.Join over sorted sets in steps_envoy_admin.go), and the "two distinct main_ count" assertion was intentionally skipped because a global /clusters count is unreliable (other deployed APIs also produce main_ clusters); the post-delete check covers that property robustly. Please do a final full review of this head, and as part of it verify one design question by reading the actual code across the repo (gateway-controller and platform-api): The API-level cluster name hashes the API's UUID (clusterkey.APILevel(apiID), where apiID = cfg.UUID in xds and rdc.Metadata.UUID in transform). A reviewer noted that "users can give api ids," so please confirm precisely:
Please cite file:line for the assignment, validation, and uniqueness constraints, give a clear verdict on the three risks (empty apiID, duplicate UUID, UUID changing across a URL edit), and flag anything else you find in this final pass. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Verify addressed items: clusterkey.go doc comments (current head) ==="
find ./gateway -path "*/utils/clusterkey*" -name "*.go" ! -name "*_test.go" | xargs cat
echo ""
echo "=== clusterkey_test.go (current head, check UUIDv7 vector) ==="
find ./gateway -path "*/utils/clusterkey*" -name "*_test.go" | xargs cat
echo ""
echo "=== resolveUpstreamCluster comment (current head) ==="
grep -n -B2 -A30 "func resolveUpstreamCluster\|resolveUpstreamCluster" ./gateway/gateway-controller/pkg/xds/translator.go | head -80
echo ""
echo "=== TestAPILevel comment (current head) ==="
find ./gateway -name "*_test.go" -path "*/clusterkey*" | xargs grep -n -B2 -A8 "func TestAPILevel" 2>/dev/null || true
echo ""
echo "=== api-level-url-stable.feature full content (current head) ==="
find . -name "api-level-url-stable.feature" | xargs cat
echo ""
echo "=== steps_envoy_admin.go - unchanged/set equality step ==="
find . -name "steps_envoy_admin.go" | xargs cat 2>/dev/null || find . -name "*envoy_admin*" | xargs cat 2>/dev/null
echo ""
echo "=== Where cfg.UUID is set in gateway-controller xds path ==="
rg -rn "\.UUID\b" --include="*.go" ./gateway/gateway-controller/pkg/xds/ -B3 -A3 | head -80
echo ""
echo "=== RestAPI/RDC Metadata.UUID assignment ==="
rg -rn "Metadata\.UUID\|\.UUID\s*=" --include="*.go" ./gateway/ -B3 -A3 | head -80
echo ""
echo "=== artifact-id annotation handling ==="
rg -rn "artifact-id\|artifactId\|artifact_id" --include="*.go" ./gateway/ -B3 -A5 | head -80
echo ""
echo "=== How the API deploy event / config is constructed (gateway-controller reconciler) ==="
find ./gateway -name "*.go" | xargs grep -ln "UUID\|Deploy\|Reconcile\|artifact" 2>/dev/null | grep -i "reconcil\|controller\|deploy\|handler" | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID" {} | head -20'
echo ""
echo "=== Platform-api: where API UUID is assigned on create ==="
find . -path "*/platform-api*" -name "*.go" 2>/dev/null | xargs grep -ln "UUID\|uuid\.New\|uuid\.Generate\|google/uuid" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID\|uuid\.New\|uuid\.Generate" {} | head -20'Length of output: 11027 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Top-level directory structure ==="
ls -1
echo ""
echo "=== Platform-api directory ==="
find . -maxdepth 3 -type d -name "platform-api" 2>/dev/null
find . -maxdepth 3 -type d | grep -i "platform.api\|platformapi" 2>/dev/null | head -10
echo ""
echo "=== All Go files with UUID assignment ==="
rg -rn "uuid\.New\|uuid\.Generate\|NewRandom\|NewString\|google/uuid\|satori/go.uuid" --include="*.go" -B2 -A5 | head -120
echo ""
echo "=== API create handler / service ==="
find . -name "*.go" | xargs grep -ln "CreateAPI\|Create.*API\|api.*create\|POST.*api" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID\|uuid" {} | head -20'
echo ""
echo "=== Gateway-controller: how it picks up API UUID from CRD/annotation ==="
find ./gateway -name "*.go" | xargs grep -ln "artifact-id\|ArtifactID\|artifactID\|Annotation\|annotation" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID\|artifact\|Annotation" {} | head -30'
echo ""
echo "=== RestAPIConfig / APIConfig struct UUID field ==="
rg -rn "type.*Config\|UUID\s+string\|Id\s+string\|ApiId\s+string" --include="*.go" ./gateway/ -A2 | head -80
echo ""
echo "=== pkg/transform/restapi.go - how Metadata.UUID is populated ==="
grep -n "UUID\|Metadata\|cfg\." ./gateway/gateway-controller/pkg/transform/restapi.go | head -40
echo ""
echo "=== pkg/xds/translator.go - how cfg.UUID is populated ==="
grep -n "UUID\|cfg\." ./gateway/gateway-controller/pkg/xds/translator.go | head -40
echo ""
echo "=== RestAPIConfig struct definition ==="
rg -rn "type RestAPIConfig\|type APIConfig\|type RestAPI\b" --include="*.go" ./gateway/ -A 30 | head -80
echo ""
echo "=== Gateway Kubernetes CRD / reconciler UUID source ==="
find ./gateway -name "*.go" | xargs grep -ln "Reconcile\|Reconciler" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "--- {} ---"; cat {} | head -150'Length of output: 3213 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Locate API structs ==="
rg -rn "UUID\s" --include="*.go" ./gateway/ -B5 | head -120
echo ""
echo "=== Locate the struct that has UUID + upstream config ==="
rg -rn "type.*Spec\|type.*Config\|type.*API\b\|type.*Upstream" --include="*.go" ./gateway/ -A15 | head -200
echo ""
echo "=== How cfg is constructed / populated in translator ==="
grep -n "cfg\b" ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== How rdc is constructed in transform ==="
grep -n "rdc\b\|RDC\b\|RuntimeDeploy" ./gateway/gateway-controller/pkg/transform/restapi.go | head -60
echo ""
echo "=== Input type for translator resolveUpstreamCluster and transform addUpstreamCluster ==="
grep -n "func.*Translate\|func.*Transform\|func.*Build\|func.*Process" ./gateway/gateway-controller/pkg/xds/translator.go | head -20
grep -n "func.*Translate\|func.*Transform\|func.*Build\|func.*Process" ./gateway/gateway-controller/pkg/transform/restapi.go | head -20
echo ""
echo "=== Find API CRD spec type definition with UUID ==="
find ./gateway -name "*.go" -path "*/api*" | xargs grep -ln "UUID\|uuid" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; cat {}'
echo ""
echo "=== Find platform-api or control-plane code ==="
find . -maxdepth 5 -name "*.go" | xargs grep -ln "UUID\|uuid\.New" 2>/dev/null | grep -v "gateway-controller\|vendor\|_test" | head -10 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID\|uuid" {} | head -20'
echo ""
echo "=== k8s-watcher or event consumer ==="
find ./gateway -name "*.go" | xargs grep -ln "Watch\|Event\|Informer\|Queue" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "--- {} ---"; grep -n "UUID\|uuid" {} | head -20'
echo ""
echo "=== What fields does the artifact annotation carry ==="
rg -rn "artifact-id\|wso2.com/artifact" --include="*.go" ./gateway/ -B3 -A10 | head -80
rg -rn "artifact-id\|wso2.com/artifact" --include="*.yaml" ./gateway/ -B3 -A10 | head -80Length of output: 2670 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Full directory tree top 3 levels ==="
find . -maxdepth 3 -type d | sort | head -100
echo ""
echo "=== All Go files (non-vendor, non-test) containing uuid ==="
find . -name "*.go" ! -path "*/vendor/*" ! -path "*_test.go" | xargs grep -li "uuid" 2>/dev/null | head -30
echo ""
echo "=== platform-api or apk directory ==="
find . -maxdepth 2 -type d | grep -v "\.git\|node_modules" | head -80
echo ""
echo "=== RestAPIConfig or RestAPI type (look in all Go files) ==="
find . -name "*.go" ! -path "*/vendor/*" | xargs grep -ln "RestAPIConfig\|type.*RestAPI\b" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; cat {}'
echo ""
echo "=== APIDeployEvent or DeployEvent type ==="
find . -name "*.go" ! -path "*/vendor/*" | xargs grep -ln "DeployEvent\|Deploy.*Event\|Event.*Deploy" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "UUID\|Deploy\|Event" {} | head -30'
echo ""
echo "=== uuid.New or uuid generation in non-test Go files ==="
find . -name "*.go" ! -path "*/vendor/*" ! -name "*_test.go" | xargs grep -n "uuid\.New\|uuid\.Generate\|uuid\.Must\|NewRandom\|NewString" 2>/dev/null | head -40Length of output: 5539 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== platform-api API create/UUID assignment ==="
find ./platform-api/src -name "*.go" | xargs grep -ln "uuid\|UUID" 2>/dev/null | head -15 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "uuid\|UUID" {} | head -30'
echo ""
echo "=== platform-api API model/struct ==="
find ./platform-api/src -name "*.go" | xargs grep -ln "type.*API\b\|RestAPI\|ApiId\|api_id" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "UUID\|uuid\|ApiId\|api_id\|type.*API\b" {} | head -20'
echo ""
echo "=== gateway-controller handlers - how UUID is used ==="
find ./gateway/gateway-controller/pkg/api/handlers -name "*.go" | xargs grep -n "UUID\|uuid" 2>/dev/null | head -40
echo ""
echo "=== gateway-controller controlplane events - UUID fields ==="
cat ./gateway/gateway-controller/pkg/controlplane/events.go 2>/dev/null || echo "not found"
echo ""
echo "=== gateway-controller controlplane client - how events are received ==="
cat ./gateway/gateway-controller/pkg/controlplane/client.go 2>/dev/null | head -150
echo ""
echo "=== gateway-controller: where cfg struct is defined ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -ln "type.*RestAPIConfig\|type.*APIConfig\|type.*APISpec" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; cat {}'
echo ""
echo "=== gateway-controller: handlers.go UUID ==="
find ./gateway/gateway-controller/pkg/api/handlers -name "handlers.go" | xargs cat 2>/dev/null | head -150
echo ""
echo "=== artifact-id annotation in gateway YAML ==="
find . -name "*.yaml" ! -path "*/.git/*" | xargs grep -ln "artifact-id\|api-platform.wso2.com" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "artifact-id\|api-platform.wso2" {} | head -10'
echo ""
echo "=== gateway-controller: annotation-based UUID reading ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -n "artifact-id\|ArtifactID\|api-platform.wso2.com" 2>/dev/null | head -30Length of output: 9465 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== platform-api src structure ==="
find ./platform-api/src -maxdepth 4 -type f -name "*.go" | sort | head -50
echo ""
echo "=== platform-api api directory ==="
find ./platform-api/src/api -name "*.go" | xargs -I{} sh -c 'echo "=== {} ==="; cat {}' 2>/dev/null | head -400
echo ""
echo "=== gateway-controller events.go ==="
cat ./gateway/gateway-controller/pkg/controlplane/events.go 2>/dev/null
echo ""
echo "=== helpers.go in handlers ==="
cat ./gateway/gateway-controller/pkg/api/handlers/helpers.go 2>/dev/null | head -150
echo ""
echo "=== gateway-controller pkg/api/handlers/handlers.go ==="
cat ./gateway/gateway-controller/pkg/api/handlers/handlers.go 2>/dev/null | head -200Length of output: 20217 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== platform-api dto/api.go ==="
cat ./platform-api/src/internal/dto/api.go
echo ""
echo "=== platform-api dto/gateway_event.go ==="
cat ./platform-api/src/internal/dto/gateway_event.go
echo ""
echo "=== platform-api handler/api.go - CreateAPI and UUID assignment ==="
grep -n "uuid\|UUID\|Id\b\|\.Id\b\|uuid\.New\|uuid\.Generate\|uuid\.Must" ./platform-api/src/internal/handler/api.go | head -50
echo ""
echo "=== platform-api - service layer for API creation ==="
find ./platform-api/src/internal -name "*.go" -path "*service*" | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "uuid\|UUID\|\.Id\b\|uuid\.New" {} | head -20'
find ./platform-api/src/internal -name "*api*service*" -o -name "*service*api*" | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; cat {}' 2>/dev/null | head -200
echo ""
echo "=== gateway-controller/pkg/controlplane/events.go ==="
cat ./gateway/gateway-controller/pkg/controlplane/events.go
echo ""
echo "=== gateway-controller/pkg/controlplane/client.go UUID fields ==="
grep -n "UUID\|uuid\|ApiId\|api_id\|Id\b" ./gateway/gateway-controller/pkg/controlplane/client.go | head -40
echo ""
echo "=== gateway-controller runtime_bootstrap.go UUID ==="
grep -n "UUID\|uuid" ./gateway/gateway-controller/cmd/controller/runtime_bootstrap.go | head -20
echo ""
echo "=== gateway-controller handlers.go - CreateAPI / updateAPI handlers UUID ==="
cat ./gateway/gateway-controller/pkg/api/handlers/handlers.go | head -300
echo ""
echo "=== gateway-controller helpers.go - UUID extraction from requests ==="
cat ./gateway/gateway-controller/pkg/api/handlers/helpers.go
echo ""
echo "=== Find artifact-id annotation key definition ==="
find . -name "*.go" ! -path "*/.git/*" | xargs grep -n "artifact-id\|ArtifactId\b\|artifactId\b\|api-platform.wso2.com" 2>/dev/null | grep -v "Binary\|vendor" | head -40
echo ""
echo "=== GatewayArtifact / artifact UUID field ==="
grep -n "ArtifactId\|artifact.*id\|Id.*artifact\|UUID\|uuid" ./platform-api/src/api/generated.go | grep -i "artifact\|api.id\|apiId" | head -20Length of output: 16370 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Find API service layer ==="
find ./platform-api/src/internal -type d | sort
echo ""
echo "=== Find service files ==="
find ./platform-api/src/internal -name "*.go" | grep -v "_test\|handler\|handler" | sort | head -40
echo ""
echo "=== Find uuid.New / uuid generation anywhere in platform-api ==="
find ./platform-api/src -name "*.go" | xargs grep -n "uuid\.New\|uuid\.Must\|google/uuid\|NewRandom\|NewString\|NewV4\|NewV7\|UUIDv7\|uuidv7" 2>/dev/null | head -40
echo ""
echo "=== gateway-controller events.go ==="
cat ./gateway/gateway-controller/pkg/controlplane/events.go 2>/dev/null
echo ""
echo "=== gateway-controller client.go (search for where UUID is read from event) ==="
grep -n "UUID\|uuid\|\.Id\b\|ApiId\|api_id\|Artifact" ./gateway/gateway-controller/pkg/controlplane/client.go | head -40
echo ""
echo "=== gateway-controller handlers: how they set cfg.UUID or pass apiID ==="
grep -n "UUID\|uuid\|\.Id\b\|ApiId\|cfg\." ./gateway/gateway-controller/pkg/api/handlers/handlers.go | head -40
echo ""
echo "=== Find cfg.UUID assignment in translator ==="
grep -n "cfg\.UUID\|\.UUID\b" ./gateway/gateway-controller/pkg/xds/translator.go | head -20
echo ""
echo "=== Find rdc.Metadata.UUID assignment ==="
grep -n "Metadata\.UUID\|rdc\.Metadata\|\.UUID\b" ./gateway/gateway-controller/pkg/transform/restapi.go | head -20
echo ""
echo "=== RestAPIConfig struct definition in gateway-controller ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -ln "type RestAPIConfig\|type APISpec\|type APIConfig" 2>/dev/null | xargs -I{} sh -c 'echo "=== {} ==="; grep -n -A20 "type RestAPIConfig\|type APISpec\|type APIConfig" {}'
echo ""
echo "=== artifact-id annotation - gateway YAML fields ==="
find . -name "*.go" ! -path "*/.git/*" | xargs grep -n "artifact-id\|wso2.com/artifact\|ArtifactID\b\|artifactID\b" 2>/dev/null | head -30
echo ""
echo "=== gateway-controller pkg/api/admin or management - REST API struct with UUID ==="
find ./gateway/gateway-controller/pkg/api -name "*.go" | xargs grep -ln "UUID\|type.*RestAPI\|type.*API\b" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "UUID\|type.*RestAPI\|type.*API\b" {} | head -20'Length of output: 34021 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== platform-api service/api.go - UUID assignment ==="
grep -n "uuid\|UUID\|\.Id\b\|ID\b" ./platform-api/src/internal/service/api.go | head -40
echo ""
echo "=== platform-api model/api.go - API model with UUID ==="
cat ./platform-api/src/internal/model/api.go 2>/dev/null | head -80
echo ""
echo "=== platform-api service/api.go - CreateAPI function ==="
grep -n -A40 "func.*Create" ./platform-api/src/internal/service/api.go | head -80
echo ""
echo "=== gateway-controller client.go - how APIDeployedEvent.APIID becomes cfg.UUID ==="
grep -n -B5 -A20 "APIDeployedEvent\|APIID\|api_id\|apiId\|cfg\.UUID\b" ./gateway/gateway-controller/pkg/controlplane/client.go | head -120
echo ""
echo "=== gateway-controller client.go - how cfg (RestAPIConfig) is stored and its UUID field ==="
grep -n "RestAPIConfig\|APIConfig\|cfg\.UUID\|\.UUID\s*=" ./gateway/gateway-controller/pkg/controlplane/client.go | head -30
echo ""
echo "=== gateway-controller: where cfg struct is populated from event ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -ln "RestAPIConfig\|cfg\.UUID" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "UUID\|RestAPIConfig\|cfg\." {} | head -30'
echo ""
echo "=== gateway-controller - direct YAML path (artifact-id annotation) ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -n "artifact-id\|wso2.com/artifact\|gateway.api-platform" 2>/dev/null | head -20
echo ""
echo "=== platform-api service/gateway_events.go - how APIID is populated in deploy event ==="
grep -n -B3 -A15 "APIID\|ApiId\|apiId\|UUID\|uuid" ./platform-api/src/internal/service/gateway_events.go | head -100
echo ""
echo "=== platform-api internal model/api.go ==="
cat ./platform-api/src/internal/model/api.go 2>/dev/null
echo ""
echo "=== platform-api repository for API - where UUID is generated ==="
find ./platform-api/src/internal/repository -name "*.go" | xargs grep -n "UUID\|uuid\.New" 2>/dev/null | head -30Length of output: 4861
Addressed items — verified
One residual: both the package comment and Design question: UUID provenance and controllabilityQ1 — Where is the UUID assigned, and can a client supply it?File:
Verdict: A client can supply a handle/slug via Q2 — Can an empty, duplicate, or user-controlled UUID reach
|
| Risk | Finding |
|---|---|
| Empty UUID | No path. uuid.New() always returns a 36-character non-empty string. |
| Duplicate UUID | No path. uuid.New() is crypto-random v4; birthday collision probability is negligible. |
| User-controlled UUID | No path. req.Id maps to the handle/slug only. The UUID is never taken from client input. |
| UUID changes on URL edit | No. UpdateAPI reads the existing UUID from the database; the URL edit changes only upstream config. |
Q3 — Is omitting an in-function guard in clusterkey.APILevel safe?
Yes. There is no supported path — through the platform-api control plane, the gateway-controller deploy event, or the gateway-controller REST handlers — by which an empty or user-controlled UUID reaches clusterkey.APILevel. The empty-input test in clusterkey_test.go is correctly positioned as a documentation of deterministic behaviour, not a guard for a real runtime condition.
Remaining comment observation (very minor)
The APILevelName doc (clusterkey.go Line 36–37) ends with "so both xDS builders name clusters identically." The package comment (Line 19–20) already says "shared by both xDS builders so they name clusters identically." The two sentences are equivalent. If you want to eliminate the duplication, the APILevelName doc can simply read:
// APILevelName joins the env prefix ("main"/"sandbox") to the APILevel fragment
// to form the full Envoy cluster name.Not a blocker; noting for completeness.
Overall verdict
The core logic, naming contract, and test coverage are correct and complete. All substantive review items from previous passes are addressed. The UUID provenance is fully server-controlled with no client-injection path. The PR is ready to merge.
d79e445 to
10548bd
Compare
|
@coderabbitai one final verification pass on head 10548bd, please, focused on production safety for EXISTING deployments. Read the actual code (the snapshot-building and startup config-load paths, the xDS translator, and pkg/policyxds). This PR changes API-level Envoy cluster names from URL-derived (cluster_) to identity-derived (main / sandbox_, hash = sha256(apiID)[:12]). Please confirm whether rolling this out can crash, disrupt, or drop traffic for already-running APIs, and answer concretely:
Give a clear verdict: does deploying this break or disrupt existing running APIs, and is anything required of operators beyond updating dashboards keyed on the old cluster names? Flag anything else you find in this final pass. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== pkg/xds/translator.go - snapshot building, SetSnapshot, GenerateSnapshot ==="
grep -n "snapshot\|Snapshot\|SetSnapshot\|GenerateSnapshot\|atomic\|cache\.New\|consistent" ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== pkg/xds/translator.go - full snapshot assembly function ==="
awk '/func.*[Ss]napshot\|func.*[Tt]ranslat\|func.*[Gg]enerate/,/^func [a-zA-Z]/' ./gateway/gateway-controller/pkg/xds/translator.go | head -200
echo ""
echo "=== How xDS resources are grouped into a snapshot (CLS+RDS together?) ==="
grep -n "Clusters\|Routes\|Listeners\|Endpoints\|ResourcesFor\|NewSnapshot\|cache\.Snapshot\|snapshot\.Resources" ./gateway/gateway-controller/pkg/xds/translator.go | head -60
echo ""
echo "=== pkg/policyxds - all files ==="
find ./gateway -path "*/policyxds*" -name "*.go" ! -name "*_test.go" | sort | xargs -I{} sh -c 'echo "=== {} ==="; cat {}'
echo ""
echo "=== cmd/controller/main.go - startup/bootstrap path ==="
cat ./gateway/gateway-controller/cmd/controller/main.go | head -300
echo ""
echo "=== Startup: how existing configs are loaded on boot ==="
find ./gateway/gateway-controller -name "*.go" | xargs grep -ln "bootstrap\|Bootstrap\|startup\|startup\|loadConfig\|LoadConfig\|loadAll\|reconcileAll\|reconcileExisting\|InitCache\|initCache" 2>/dev/null | head -10 | xargs -I{} sh -c 'echo "=== {} ==="; cat {} | head -150'
echo ""
echo "=== How existing API configs are replayed into cache on startup ==="
grep -n "for\|range\|existing\|list\|List\|cache\|Cache\|init\|Init" ./gateway/gateway-controller/pkg/controlplane/client.go | head -60
echo ""
echo "=== client.go startup config load full ==="
cat ./gateway/gateway-controller/pkg/controlplane/client.go | head -200
echo ""
echo "=== Check if cluster name is stored in DB or any config store ==="
find ./platform-api/src -name "*.go" | xargs grep -n "cluster_\|ClusterName\|EnvoyCluster\|cluster_name" 2>/dev/null | head -30
echo ""
echo "=== Check if cluster name is stored in k8s CRD or annotation ==="
find ./gateway -name "*.go" | xargs grep -n "cluster_\|ClusterName\|EnvoyCluster\|cluster_name" 2>/dev/null | grep -v "_test\|clusterkey\|APILevel\|sanitize" | head -30
echo ""
echo "=== Check for any SQL files or schema files ==="
find . -name "*.sql" -o -name "*schema*" -o -name "*migration*" 2>/dev/null | grep -v ".git" | head -20Length of output: 145966 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== xDS cache type - consistent vs naive ==="
grep -rn "consistent\|Consistent\|snapshot_cache\|NewSnapshotCache\|NewSnapshotCacheWithOptions\|cache\.New\b\|NewLinearCache" ./gateway/gateway-controller/ --include="*.go" | head -30
echo ""
echo "=== xDS snapshot construction - what resources go into one snapshot call ==="
grep -n "cache\.NewSnapshot\|NewSnapshot\|snapshot\.New\|MakeSnapshot" ./gateway/gateway-controller/pkg/xds/translator.go | head -20
grep -rn "cache\.NewSnapshot\|NewSnapshot\|snapshot\.New\|MakeSnapshot" ./gateway/gateway-controller/ --include="*.go" | head -30
echo ""
echo "=== pkg/xds/translator.go SetSnapshot call with all resource types ==="
grep -n -B5 -A30 "SetSnapshot\|setSnapshot\|cache\.SetSnapshot" ./gateway/gateway-controller/pkg/xds/translator.go | head -100
echo ""
echo "=== policyxds snapshot.go - full content ==="
cat ./gateway/gateway-controller/pkg/policyxds/snapshot.go
echo ""
echo "=== policyxds - SetSnapshot with all resource types ==="
grep -n -B5 -A30 "SetSnapshot\|NewSnapshot\|snapshot\." ./gateway/gateway-controller/pkg/policyxds/snapshot.go | head -100
echo ""
echo "=== Does policyxds snapshot include cluster names? ==="
grep -n "Cluster\|cluster\|ClusterKey\|EnvoyClusterName" ./gateway/gateway-controller/pkg/policyxds/snapshot.go | head -30
echo ""
echo "=== startup bootstrap: how does the controller load existing API state ==="
cat ./gateway/gateway-controller/cmd/controller/runtime_bootstrap.go
echo ""
echo "=== startup: initial xDS cache population ==="
grep -rn "for.*range\|for.*apis\|for.*all\|existing\|loadExisting\|initSnapshot\|bootstrap\|Bootstrap" ./gateway/gateway-controller/pkg/xds/ --include="*.go" -A5 | head -100
echo ""
echo "=== startup/init: how many snapshots are created per boot (one per API or one consolidated?) ==="
grep -rn "SetSnapshot\|snapshot" ./gateway/gateway-controller/pkg/xds/translator.go | grep -i "set\|create\|init\|boot" | head -30
echo ""
echo "=== Check xDS version in go.mod - solo-io/go-control-plane or envoyproxy? ==="
grep "go-control-plane\|envoy.*xds\|xds.*server" ./gateway/gateway-controller/go.mod | head -10
echo ""
echo "=== Connection draining: drain timeout, drain_connections, close_connections_on_timeout ==="
grep -rn "DrainTimeout\|drain_timeout\|drain_connections\|close_conn\|DRAIN" ./gateway/gateway-controller/ --include="*.go" | head -20
grep -rn "DrainTimeout\|drain_timeout" ./gateway/ --include="*.yaml" --include="*.json" | head -10Length of output: 6357 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== pkg/xds/snapshot.go full content ==="
cat ./gateway/gateway-controller/pkg/xds/snapshot.go
echo ""
echo "=== pkg/xds/translator.go - how it calls snapshot manager ==="
grep -n "snapshotManager\|SnapshotManager\|BuildSnapshot\|UpdateSnapshot\|GenerateSnapshot\|Translate\b" ./gateway/gateway-controller/pkg/xds/translator.go | head -30
echo ""
echo "=== main.go - how startup calls xds snapshot generation ==="
grep -n "GenerateSnapshot\|UpdateSnapshot\|BuildSnapshot\|snapshotManager\|policyManager\|loadRuntime\|hydrateConfig\|LoadArtifact\|xds\." ./gateway/gateway-controller/cmd/controller/main.go | head -60
echo ""
echo "=== how startup triggers initial xDS snapshot (look for the initial snapshot push) ==="
grep -n -A10 "loadRuntimeConfigs\|GenerateSnapshot\|UpdateSnapshot\|InitialSnapshot\|snapshotManager" ./gateway/gateway-controller/cmd/controller/main.go | head -100
echo ""
echo "=== pkg/xds/snapshot.go - NewSnapshot call with all resource types ==="
grep -n "NewSnapshot\|cache\.Snapshot\|resources\[" ./gateway/gateway-controller/pkg/xds/snapshot.go | head -30
echo ""
echo "=== drain timeout in envoy bootstrap config ==="
find ./gateway -name "*.yaml" -o -name "*.json" | xargs grep -ln "drain_time\|drain_timeout\|overload\|connection_pool" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; grep -n "drain_time\|drain_timeout\|per_connection_buffer\|connection_pool" {}'
echo ""
echo "=== envoy bootstrap or static resources ==="
find ./gateway -name "envoy-bootstrap*" -o -name "envoy.yaml" -o -name "*bootstrap*.yaml" 2>/dev/null | head -5 | xargs -I{} sh -c 'echo "=== {} ==="; cat {}'
echo ""
echo "=== check DB schema for cluster name fields ==="
find . -name "*.go" -path "*/storage*" | xargs grep -n "cluster_name\|ClusterName\|EnvoyCluster" 2>/dev/null | head -20
echo ""
echo "=== Verify startup generates one consolidated snapshot (not per-API) ==="
grep -n "for.*range\|for.*cfg\|for.*api" ./gateway/gateway-controller/pkg/xds/snapshot.go | head -20Length of output: 10951
1. Rollout atomicity and startup snapshot consolidationSingle atomic snapshot — confirmed. All resource types share one version string. Startup — one consolidated snapshot, not N per-API. On boot:
There is no per-API snapshot push on startup. 2. CDS-vs-RDS ordering windowThe window exists but is inherent to xDS SotW — not introduced by this PR.
The theoretical window: Envoy receives and ACKs the new CDS (warms In practice, Envoy's cluster warm-and-ready sequence delays the CDS ACK until all new clusters are warmed, which gives RDS time to arrive. But this is not a hard guarantee under all timing conditions. This risk is one-time on upgrade — it applies only to the initial cluster rename on the first config regeneration after deploying this PR. Post-upgrade, URL edits no longer rename the cluster, so this window never recurs for URL changes. 3. In-flight connection handling during the renameNo explicit 4. Persistence — no cluster names stored anywhereConfirmed: cluster name is purely derived in memory, never persisted.
5. Operational side effects
Neither effect causes a crash or routing failure. 6. RevertabilitySafe. The URL config (scheme, host, port, path) is stored in the DB as part of the API spec. Reverting the PR leaves the DB untouched; the old code regenerates VerdictDeploying this PR does not crash or permanently disrupt existing APIs. The only production risk is the one-time CDS-before-RDS window on the first config regeneration after upgrade, where a brief 503 is possible per API. This is inherent to xDS SotW and applies to any cluster rename operation in this stack; it is not new to this PR. Operator actions required:
|
|
@coderabbitai quick clarification on the operator note for #2137. The bundled Grafana dashboards in gateway/observability/grafana/dashboards/ (gateway-controller.json, infrastructure-overview.json, policy-engine.json) appear NOT to use the per-cluster upstream_cluster label. Can you confirm: do any shipped dashboards or alerts in this repo key on cluster names (upstream_cluster / cluster_http_)? So is the "update dashboards" action actually required for anything in-repo, or only for custom user-built per-cluster dashboards? Cite file:line. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "=== Files in gateway/observability ==="
find ./gateway/observability -type f | sort
echo ""
echo "=== Search for upstream_cluster in all observability files ==="
rg -n "upstream_cluster\|cluster_http\|cluster_https" ./gateway/observability/ || echo "NO MATCHES"
echo ""
echo "=== Search for upstream_cluster / cluster_http anywhere in the repo (non-Go) ==="
rg -n "upstream_cluster\|cluster_http\|cluster_https" --type-not=go . 2>/dev/null | grep -v "\.git" | grep -v "_test\." | head -40 || echo "NO MATCHES"
echo ""
echo "=== gateway-controller.json cluster references ==="
cat ./gateway/observability/grafana/dashboards/gateway-controller.json 2>/dev/null | python3 -c "
import sys, json
data = json.load(sys.stdin)
content = json.dumps(data)
import re
hits = [(m.start(), content[max(0,m.start()-60):m.end()+60]) for m in re.finditer(r'upstream_cluster|cluster_http|cluster_https', content)]
for pos, ctx in hits:
print(f'pos={pos}: ...{ctx}...')
if not hits:
print('NO MATCHES in gateway-controller.json')
" 2>/dev/null || echo "gateway-controller.json not found or not valid JSON"
echo ""
echo "=== infrastructure-overview.json cluster references ==="
cat ./gateway/observability/grafana/dashboards/infrastructure-overview.json 2>/dev/null | python3 -c "
import sys, json
data = json.load(sys.stdin)
content = json.dumps(data)
import re
hits = [(m.start(), content[max(0,m.start()-60):m.end()+60]) for m in re.finditer(r'upstream_cluster|cluster_http|cluster_https', content)]
for pos, ctx in hits:
print(f'pos={pos}: ...{ctx}...')
if not hits:
print('NO MATCHES in infrastructure-overview.json')
" 2>/dev/null || echo "infrastructure-overview.json not found or not valid JSON"
echo ""
echo "=== policy-engine.json cluster references ==="
cat ./gateway/observability/grafana/dashboards/policy-engine.json 2>/dev/null | python3 -c "
import sys, json
data = json.load(sys.stdin)
content = json.dumps(data)
import re
hits = [(m.start(), content[max(0,m.start()-60):m.end()+60]) for m in re.finditer(r'upstream_cluster|cluster_http|cluster_https', content)]
for pos, ctx in hits:
print(f'pos={pos}: ...{ctx}...')
if not hits:
print('NO MATCHES in policy-engine.json')
" 2>/dev/null || echo "policy-engine.json not found or not valid JSON"
echo ""
echo "=== Any Prometheus alert/recording rules in repo ==="
find . -name "*.yaml" -o -name "*.yml" | xargs grep -ln "upstream_cluster\|cluster_http\|cluster_https" 2>/dev/null | grep -v ".git" | head -20 || echo "NO MATCHES in YAML files"Length of output: 3801
The "update dashboards" action item from the migration notes applies only to custom, user-built dashboards or alerts outside this repo. Nothing shipped in-repo needs to change. The operator note can be narrowed to: "Custom dashboards or alerts keyed on the old |
…rage Shorten the doc comments on clusterkey.APILevel, the package, and resolveUpstreamCluster to the repo's terse style. Add a UUID-shaped known-answer vector and an integration check that deleting one API does not disturb another sharing the same backend host.
10548bd to
f8fcbb9
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
gateway/gateway-controller/pkg/transform/restapi_test.go (1)
503-522: ⚡ Quick winAssert endpoint update in the URL-edit stability test.
This test currently validates only stable cluster naming. Add an assertion that the upstream endpoint value changes across
cfgA→cfgBso it verifies the full contract (stable name, updated backend target).Proposed test enhancement
nameA := rdcA.Routes["GET|/test/users|main.local"].Upstream.ClusterKey nameB := rdcB.Routes["GET|/test/users|main.local"].Upstream.ClusterKey assert.Equal(t, nameA, nameB, "API-level main cluster name must not depend on URL "+ "(URL-stable contract: the name must survive URL edits)") + + clusterA, okA := rdcA.UpstreamClusters[nameA] + clusterB, okB := rdcB.UpstreamClusters[nameB] + require.True(t, okA) + require.True(t, okB) + require.Len(t, clusterA.Endpoints, 1) + require.Len(t, clusterB.Endpoints, 1) + assert.NotEqual(t, clusterA.Endpoints[0], clusterB.Endpoints[0], + "URL edit should update endpoint details on the same cluster name") }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@gateway/gateway-controller/pkg/transform/restapi_test.go` around lines 503 - 522, The test TestRestAPITransformer_APILevelURLStableAcrossURLEdit validates that cluster naming remains stable when the upstream URL changes from cfgA to cfgB, but it does not verify that the actual upstream endpoint values differ between rdcA and rdcB. Add an assertion after the existing assert.Equal call that compares the upstream endpoint values from both rdcA and rdcB routes and verifies they are different (using assert.NotEqual), ensuring the test validates the complete contract that cluster names are stable while backend targets are updated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@gateway/gateway-controller/pkg/transform/restapi_test.go`:
- Around line 503-522: The test
TestRestAPITransformer_APILevelURLStableAcrossURLEdit validates that cluster
naming remains stable when the upstream URL changes from cfgA to cfgB, but it
does not verify that the actual upstream endpoint values differ between rdcA and
rdcB. Add an assertion after the existing assert.Equal call that compares the
upstream endpoint values from both rdcA and rdcB routes and verifies they are
different (using assert.NotEqual), ensuring the test validates the complete
contract that cluster names are stable while backend targets are updated.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2fee9be3-4100-4c76-a981-42a203012318
📒 Files selected for processing (12)
gateway/gateway-controller/cmd/controller/main.gogateway/gateway-controller/pkg/policyxds/policyxds_test.gogateway/gateway-controller/pkg/transform/restapi.gogateway/gateway-controller/pkg/transform/restapi_test.gogateway/gateway-controller/pkg/utils/clusterkey/clusterkey.gogateway/gateway-controller/pkg/utils/clusterkey/clusterkey_test.gogateway/gateway-controller/pkg/xds/translator.gogateway/gateway-controller/pkg/xds/translator_test.gogateway/it/features/api-level-url-stable.featuregateway/it/steps_envoy_admin.gogateway/it/suite_test.gogateway/spec/impls/1-basic-gateway-with-controller/data-model.md
Purpose
API-level Envoy cluster names were derived from the backend URL (
cluster_<scheme>_<sanitized-host>). Because the name depended on the URL, editing a backend URL renamed the cluster: the old cluster was destroyed, a new one created, and routes repointed, draining in-flight connections on every URL change. This makes a routine config edit disruptive to live traffic.Goals
Make an API's main/sandbox cluster names stable for the lifetime of the API, so a backend-URL edit becomes an endpoint update on the same named cluster instead of a cluster rename. As a side effect, fix a latent collision where a main and sandbox upstream on the same URL, or two different APIs sharing a backend URL, collapsed into one shared cluster.
Approach
Derive the cluster name from the API identity instead of the URL:
main_<hash>/sandbox_<hash>, where the hash is the first 12 bytes ofsha256(apiID)and the env is a plaintextmain_/sandbox_prefix (not part of the hash). A single shared helper (pkg/utils/clusterkey) is used by both xDS builders (the Envoy xDS translator and the RuntimeDeployConfig transform path), so the two paths cannot diverge, and the two hand-rolled URL-sanitizing functions are removed.ClusterKeyandEnvoyClusterNameare unified to the same string so the policy engine never points Envoy at an unknown cluster (which would surface as 503 NoRoute).Rollout is a one-time, in-memory effect: on the next config regeneration each API's clusters are renamed once, delivered atomically (new cluster plus repointed routes in a single xDS snapshot, validated by
snapshot.Consistent()), so live traffic is not dropped. There is no database migration, the name is derived at translate time and never persisted.User stories
N/A. This is an internal gateway cluster-naming change with no user-facing API or behavior change.
Documentation
N/A for external product docs. In-repo, the cluster-naming section of
gateway/spec/impls/1-basic-gateway-with-controller/data-model.mdis updated from the old URL-sanitized scheme to the identity-derived scheme.Automation tests
Security checks
Samples
No new configuration or API surface is introduced (cluster naming is internal), so this shows the observable behavior with any REST API.
Deploy an API with a main upstream:
The gateway names the Envoy cluster from the API identity, e.g.
main_5b50e1f10eb52155ebe4d947(themain_prefix plus the first 12 bytes ofsha256(apiID)), not from the URL.Now change the upstream to a different host and port:
The cluster name stays
main_5b50e1f10eb52155ebe4d947, so the route keeps pointing at the same named cluster and Envoy updates the endpoint in place. Under the old scheme the name would have changed fromcluster_http_backend-a_8080tocluster_http_backend-b_9090, renaming the cluster on every URL edit.Related PRs
N/A.
Test environment
GitHub Actions CI on Linux: Go unit tests and the gateway integration-test suite. No JDK, database, or browser dependency for this change.