feat: per-deployment internal RPC ClusterIP Service + status.rpcService by bdchatham · Pull Request #99 · sei-protocol/sei-k8s-controller

bdchatham · 2026-04-17T18:32:07Z

Summary

Adds a cluster-internal ClusterIP Service to every SeiNodeDeployment so in-cluster consumers can dial a single stable DNS name — {deployment}-rpc.{namespace}.svc — rather than picking an ordinal on the per-node headless Service and handling failover themselves. kube-proxy L4 load-balances across ready child pods.

Reconciled unconditionally — this path is independent of .spec.networking and lives alongside (not replacing) the existing external reconcileNetworking / HTTPRoute / External-DNS pipeline.

References

Design: platform#96 §Critical Controller Gap
Milestone brief: wave-autobake/milestones/M1.md

Interface contract (frozen)

status:
  rpcService:
    name: <deployment-name>-rpc
    namespace: <namespace>
    ports:
      rpc: 26657
      evmHttp: 8545
      evmWs: 8546
      rest: 1317
      grpc: 9090

Service: ClusterIP, PublishNotReadyAddresses: false.
Selector: sei.io/nodedeployment=<deployment> (already stamped on child pod templates via generateSeiNode → spec.PodLabels[groupLabel]).
OwnerReferences point at the parent SeiNodeDeployment (cascade delete on DeletionPolicy: Delete; orphaned via orphanNetworkingResources on Retain).

Files changed

File	What
`api/v1alpha1/seinodedeployment_types.go`	New `RpcServiceStatus` / `RpcServicePorts` types; `.status.rpcService` pointer field.
`api/v1alpha1/zz_generated.deepcopy.go`	Regenerated.
`config/crd/sei.io_seinodedeployments.yaml`, `manifests/sei.io_seinodedeployments.yaml`	Regenerated CRD schema.
`internal/controller/nodedeployment/internal_service.go` (new)	`generateInternalRpcService` (pure) + `reconcileInternalRpcService` (SSA, stamps status in-memory).
`internal/controller/nodedeployment/internal_service_test.go` (new)	Generator tests + fake-client reconcile / orphan tests.
`internal/controller/nodedeployment/controller.go`	Invoke `reconcileInternalRpcService` unconditionally in the reconcile loop.
`internal/controller/nodedeployment/networking.go`	`orphanNetworkingResources` now also strips ownerRef on the internal Service.

Judgment calls / deviations

Port name evm-http (not seiconfig's evm-rpc): the milestone brief explicitly pins the five port names in the Service spec. Keeping the brief's names, which happen to be more readable to kube-native tools; the numeric port is unchanged.
Selector excludes revision label. The shared external Service pins to the active revision during a rollout (groupSelector); the internal Service uses groupOnlySelector so kube-proxy continues to route to whichever pods are Ready across incumbents/entrants. Covered by TestGenerateInternalRpcService_SelectorIgnoresRevision.
No new condition type (per non-goal): the Service is unconditionally reconciled, so there's nothing lifecycle-worthy to surface.
Single-patch model preserved: reconcileInternalRpcService mutates group.Status.RpcService in-memory. updateStatus calls the existing Status().Patch() at the end of every reconcile path (including the DNS-pending early return), so no second patch was added.
Label propagation was already correct. generateSeiNode already writes spec.PodLabels[groupLabel] = group.Name, which flows through noderesource.ResourceLabels onto the StatefulSet's pod template. No changes needed to the labeling path — the selector contract is already satisfied.

Manual testing

# 1. Build + load image, apply CRDs + controller.
make docker-build IMG=sei-k8s-controller:m1-test
# (load into your target cluster / kind)
kubectl apply -f manifests/sei.io_seinodedeployments.yaml

# 2. Apply a private (no .spec.networking) SeiNodeDeployment.
cat <<'YAML' | kubectl apply -f -
apiVersion: sei.io/v1alpha1
kind: SeiNodeDeployment
metadata:
  name: demo-rpc
  namespace: sei
spec:
  replicas: 2
  updateStrategy:
    type: InPlace
  template:
    spec:
      chainId: pacific-1
      image: ghcr.io/sei-protocol/seid:v1.0.0
      fullNode:
        snapshot:
          s3:
            targetHeight: 100000
YAML

# 3. Verify Service and status are present.
kubectl -n sei get svc demo-rpc-rpc -o yaml
kubectl -n sei get snd demo-rpc -o jsonpath='{.status.rpcService}'

# Expected:
#   Service demo-rpc-rpc with 5 named ports (rpc/evm-http/evm-ws/rest/grpc).
#   Selector: sei.io/nodedeployment=demo-rpc
#   OwnerReference -> demo-rpc (controller=true)
#   .status.rpcService.{name, namespace, ports.{rpc,evmHttp,evmWs,rest,grpc}} populated.

# 4. Once pods are Ready, DNS + traffic should work:
kubectl -n sei run curl --image=curlimages/curl --rm -it --restart=Never -- \
  curl -sS http://demo-rpc-rpc.sei.svc:26657/status

# 5. Cascade delete:
kubectl -n sei delete snd demo-rpc
kubectl -n sei get svc demo-rpc-rpc   # should be gone

Test plan

make lint clean.
make test green (new tests: 9 generator + 5 reconcile/orphan cases).
make manifests generate idempotent after commit.
make build / make ci succeed.
Manual smoke against a real cluster (reviewer).

Adds a cluster-internal ClusterIP Service to every SeiNodeDeployment so in-cluster consumers can dial a single stable DNS name ({deployment}-rpc.{namespace}.svc) rather than chasing ordinals on per-node headless Services. kube-proxy L4 load-balances across ready child pods via the existing sei.io/nodedeployment pod label. Reconciled unconditionally — lives alongside (not replacing) the .spec.networking / HTTPRoute path. - API: new RpcServiceStatus / RpcServicePorts types; additive pointer field .status.rpcService on SeiNodeDeployment. - Generator: pure generateInternalRpcService with named ports (rpc/evm-http/evm-ws/rest/grpc) per the milestone interface contract. - Reconcile: new reconcileInternalRpcService invoked from the deployment reconcile loop; populates status.rpcService in-memory for the existing single Status().Patch() flush. - Orphan path: retain-policy now strips the internal Service's ownerRef alongside the external one. - Tests: pure-generator and fake-client reconcile coverage (status stamping, ownerRef shape, idempotency, orphan path). Ports use "evm-http" in the Service (not seiconfig's "evm-rpc") because the milestone interface contract fixes those names for kube-native tools. Refs: platform#96 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three concerns folded into one follow-up: 1. Rename `rpcService` → `internalService` (status field, types, Service name suffix, tests, godoc). The Service is the single internal access point; naming it mode-neutral ages better than "RPC"-scoped naming, especially with the stateful ports dropped below. 2. Drop stateful ports (evm-ws 8546, grpc 9090) from the Service and status schema. A kube-proxy L4 LB spreads connections across pods, which breaks WebSocket subscriptions and pins HTTP/2 gRPC per-connection — neither load-balances correctly. Remaining ports: rpc (26657), evm-http (8545), rest (1317) — all stateless HTTP request/response. Stateful consumers use per-node headless Services. 3. Move internal Service orphan handling out of `orphanNetworkingResources` into a new `orphanInternalService` method. The internal Service's lifecycle is unconditional; it should not be bundled with the networking-resources teardown. Added a test for `.spec.networking → nil` transitions confirming the internal Service survives. All tests green (lint + test). CRD + DeepCopy regenerated via `make manifests generate`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…only ports) Picks up the post-#99 controller binary: - new SeiNodeDeployment.status.internalService field - per-deployment ClusterIP Service with the stateless HTTP port set (rpc/evm-http/rest only — evm-ws and grpc deliberately excluded) - internal Service lifecycle is independent of .spec.networking Unblocks the autobake workflow (platform repo, M2b) which reads status.internalService.name to dial the chain's RPC. Image: 189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:20a1a9038725109d434f3940797200afaf75aa44 sha256:05ee5a60d3541c10e0409086381284a1e1695aabd771a14b049de170e1ac0a37 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…only ports) (#100) Picks up the post-#99 controller binary: - new SeiNodeDeployment.status.internalService field - per-deployment ClusterIP Service with the stateless HTTP port set (rpc/evm-http/rest only — evm-ws and grpc deliberately excluded) - internal Service lifecycle is independent of .spec.networking Unblocks the autobake workflow (platform repo, M2b) which reads status.internalService.name to dial the chain's RPC. Image: 189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:20a1a9038725109d434f3940797200afaf75aa44 sha256:05ee5a60d3541c10e0409086381284a1e1695aabd771a14b049de170e1ac0a37 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Controller pods crash on startup with: Failed to initialize OTel MeterProvider error: building OTel resource: conflicting Schema URL: https://opentelemetry.io/schemas/1.40.0 and https://opentelemetry.io/schemas/1.26.0 resource.Merge rejects merging resources whose schema URLs differ. resource.Default() reports schema v1.40.0 (embedded in the SDK at v1.43.0), while cmd/telemetry.go hardcoded semconv/v1.26.0 as the schema for the custom resource overlay. Bump the semconv import to v1.40.0 so the two schema URLs agree. All three symbols in use here (semconv.SchemaURL, ServiceName, ServiceVersion) are stable across semconv versions — drop-in substitution. Unblocks the controller image bump that #100 landed. Post-#99 controller pods stop CrashLoopBackOff and roll out cleanly, which in turn unblocks SeiNodeDeployment.status.internalService for the autobake workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Controller pods crash on startup with: Failed to initialize OTel MeterProvider error: building OTel resource: conflicting Schema URL: https://opentelemetry.io/schemas/1.40.0 and https://opentelemetry.io/schemas/1.26.0 resource.Merge rejects merging resources whose schema URLs differ. resource.Default() reports schema v1.40.0 (embedded in the SDK at v1.43.0), while cmd/telemetry.go hardcoded semconv/v1.26.0 as the schema for the custom resource overlay. Bump the semconv import to v1.40.0 so the two schema URLs agree. All three symbols in use here (semconv.SchemaURL, ServiceName, ServiceVersion) are stable across semconv versions — drop-in substitution. Unblocks the controller image bump that #100 landed. Post-#99 controller pods stop CrashLoopBackOff and roll out cleanly, which in turn unblocks SeiNodeDeployment.status.internalService for the autobake workflow. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Picks up the OTel schema-URL fix from #101, which unblocks the post-#99 controller image. Prior image (20a1a90) crashes at startup with: Failed to initialize OTel MeterProvider conflicting Schema URL: ...1.40.0 and ...1.26.0 Image: 189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:d122d39d5863a391d879cee2abdab5808a631db3 sha256:6a0a11bd2b135777d7bf4973f4009553b49a3cd4d2bfe41e08947e6a1780fde4 Post-Flux-sync verification: kubectl -n sei-k8s-controller-system get pods # expect: all pods Running on the new image, no CrashLoopBackOff Unblocks autobake workflow (platform#101) which reads status.internalService.name from the post-#99 controller. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Picks up the OTel schema-URL fix from #101, which unblocks the post-#99 controller image. Prior image (20a1a90) crashes at startup with: Failed to initialize OTel MeterProvider conflicting Schema URL: ...1.40.0 and ...1.26.0 Image: 189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:d122d39d5863a391d879cee2abdab5808a631db3 sha256:6a0a11bd2b135777d7bf4973f4009553b49a3cd4d2bfe41e08947e6a1780fde4 Post-Flux-sync verification: kubectl -n sei-k8s-controller-system get pods # expect: all pods Running on the new image, no CrashLoopBackOff Unblocks autobake workflow (platform#101) which reads status.internalService.name from the post-#99 controller. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bdchatham and others added 2 commits April 17, 2026 11:21

bdchatham merged commit 20a1a90 into main Apr 17, 2026
2 checks passed

bdchatham mentioned this pull request Apr 17, 2026

chore: bump controller image to 20a1a90 #100

Merged

bdchatham mentioned this pull request Apr 17, 2026

fix: align semconv import with OTel SDK schema URL (controller CrashLoop) #101

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: per-deployment internal RPC ClusterIP Service + status.rpcService#99

feat: per-deployment internal RPC ClusterIP Service + status.rpcService#99
bdchatham merged 2 commits intomainfrom
feat/seinodedeployment-rpc-service

bdchatham commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdchatham commented Apr 17, 2026

Summary

References

Interface contract (frozen)

Files changed

Judgment calls / deviations

Manual testing

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant