Skip to content

feat: per-deployment internal RPC ClusterIP Service + status.rpcService#99

Merged
bdchatham merged 2 commits intomainfrom
feat/seinodedeployment-rpc-service
Apr 17, 2026
Merged

feat: per-deployment internal RPC ClusterIP Service + status.rpcService#99
bdchatham merged 2 commits intomainfrom
feat/seinodedeployment-rpc-service

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

Summary

Adds a cluster-internal ClusterIP Service to every SeiNodeDeployment so in-cluster consumers can dial a single stable DNS name — {deployment}-rpc.{namespace}.svc — rather than picking an ordinal on the per-node headless Service and handling failover themselves. kube-proxy L4 load-balances across ready child pods.

Reconciled unconditionally — this path is independent of .spec.networking and lives alongside (not replacing) the existing external reconcileNetworking / HTTPRoute / External-DNS pipeline.

References

Interface contract (frozen)

status:
  rpcService:
    name: <deployment-name>-rpc
    namespace: <namespace>
    ports:
      rpc: 26657
      evmHttp: 8545
      evmWs: 8546
      rest: 1317
      grpc: 9090
  • Service: ClusterIP, PublishNotReadyAddresses: false.
  • Selector: sei.io/nodedeployment=<deployment> (already stamped on child pod templates via generateSeiNodespec.PodLabels[groupLabel]).
  • OwnerReferences point at the parent SeiNodeDeployment (cascade delete on DeletionPolicy: Delete; orphaned via orphanNetworkingResources on Retain).

Files changed

File What
api/v1alpha1/seinodedeployment_types.go New RpcServiceStatus / RpcServicePorts types; .status.rpcService pointer field.
api/v1alpha1/zz_generated.deepcopy.go Regenerated.
config/crd/sei.io_seinodedeployments.yaml, manifests/sei.io_seinodedeployments.yaml Regenerated CRD schema.
internal/controller/nodedeployment/internal_service.go (new) generateInternalRpcService (pure) + reconcileInternalRpcService (SSA, stamps status in-memory).
internal/controller/nodedeployment/internal_service_test.go (new) Generator tests + fake-client reconcile / orphan tests.
internal/controller/nodedeployment/controller.go Invoke reconcileInternalRpcService unconditionally in the reconcile loop.
internal/controller/nodedeployment/networking.go orphanNetworkingResources now also strips ownerRef on the internal Service.

Judgment calls / deviations

  • Port name evm-http (not seiconfig's evm-rpc): the milestone brief explicitly pins the five port names in the Service spec. Keeping the brief's names, which happen to be more readable to kube-native tools; the numeric port is unchanged.
  • Selector excludes revision label. The shared external Service pins to the active revision during a rollout (groupSelector); the internal Service uses groupOnlySelector so kube-proxy continues to route to whichever pods are Ready across incumbents/entrants. Covered by TestGenerateInternalRpcService_SelectorIgnoresRevision.
  • No new condition type (per non-goal): the Service is unconditionally reconciled, so there's nothing lifecycle-worthy to surface.
  • Single-patch model preserved: reconcileInternalRpcService mutates group.Status.RpcService in-memory. updateStatus calls the existing Status().Patch() at the end of every reconcile path (including the DNS-pending early return), so no second patch was added.
  • Label propagation was already correct. generateSeiNode already writes spec.PodLabels[groupLabel] = group.Name, which flows through noderesource.ResourceLabels onto the StatefulSet's pod template. No changes needed to the labeling path — the selector contract is already satisfied.

Manual testing

# 1. Build + load image, apply CRDs + controller.
make docker-build IMG=sei-k8s-controller:m1-test
# (load into your target cluster / kind)
kubectl apply -f manifests/sei.io_seinodedeployments.yaml

# 2. Apply a private (no .spec.networking) SeiNodeDeployment.
cat <<'YAML' | kubectl apply -f -
apiVersion: sei.io/v1alpha1
kind: SeiNodeDeployment
metadata:
  name: demo-rpc
  namespace: sei
spec:
  replicas: 2
  updateStrategy:
    type: InPlace
  template:
    spec:
      chainId: pacific-1
      image: ghcr.io/sei-protocol/seid:v1.0.0
      fullNode:
        snapshot:
          s3:
            targetHeight: 100000
YAML

# 3. Verify Service and status are present.
kubectl -n sei get svc demo-rpc-rpc -o yaml
kubectl -n sei get snd demo-rpc -o jsonpath='{.status.rpcService}'

# Expected:
#   Service demo-rpc-rpc with 5 named ports (rpc/evm-http/evm-ws/rest/grpc).
#   Selector: sei.io/nodedeployment=demo-rpc
#   OwnerReference -> demo-rpc (controller=true)
#   .status.rpcService.{name, namespace, ports.{rpc,evmHttp,evmWs,rest,grpc}} populated.

# 4. Once pods are Ready, DNS + traffic should work:
kubectl -n sei run curl --image=curlimages/curl --rm -it --restart=Never -- \
  curl -sS http://demo-rpc-rpc.sei.svc:26657/status

# 5. Cascade delete:
kubectl -n sei delete snd demo-rpc
kubectl -n sei get svc demo-rpc-rpc   # should be gone

Test plan

  • make lint clean.
  • make test green (new tests: 9 generator + 5 reconcile/orphan cases).
  • make manifests generate idempotent after commit.
  • make build / make ci succeed.
  • Manual smoke against a real cluster (reviewer).

bdchatham and others added 2 commits April 17, 2026 11:21
Adds a cluster-internal ClusterIP Service to every SeiNodeDeployment so
in-cluster consumers can dial a single stable DNS name
({deployment}-rpc.{namespace}.svc) rather than chasing ordinals on
per-node headless Services. kube-proxy L4 load-balances across ready
child pods via the existing sei.io/nodedeployment pod label.

Reconciled unconditionally — lives alongside (not replacing) the
.spec.networking / HTTPRoute path.

- API: new RpcServiceStatus / RpcServicePorts types; additive pointer
  field .status.rpcService on SeiNodeDeployment.
- Generator: pure generateInternalRpcService with named ports
  (rpc/evm-http/evm-ws/rest/grpc) per the milestone interface contract.
- Reconcile: new reconcileInternalRpcService invoked from the deployment
  reconcile loop; populates status.rpcService in-memory for the existing
  single Status().Patch() flush.
- Orphan path: retain-policy now strips the internal Service's
  ownerRef alongside the external one.
- Tests: pure-generator and fake-client reconcile coverage (status
  stamping, ownerRef shape, idempotency, orphan path).

Ports use "evm-http" in the Service (not seiconfig's "evm-rpc") because
the milestone interface contract fixes those names for kube-native tools.

Refs: platform#96

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three concerns folded into one follow-up:

1. Rename `rpcService` → `internalService` (status field, types, Service
   name suffix, tests, godoc). The Service is the single internal access
   point; naming it mode-neutral ages better than "RPC"-scoped naming,
   especially with the stateful ports dropped below.

2. Drop stateful ports (evm-ws 8546, grpc 9090) from the Service and
   status schema. A kube-proxy L4 LB spreads connections across pods,
   which breaks WebSocket subscriptions and pins HTTP/2 gRPC
   per-connection — neither load-balances correctly. Remaining ports:
   rpc (26657), evm-http (8545), rest (1317) — all stateless HTTP
   request/response. Stateful consumers use per-node headless Services.

3. Move internal Service orphan handling out of
   `orphanNetworkingResources` into a new `orphanInternalService` method.
   The internal Service's lifecycle is unconditional; it should not be
   bundled with the networking-resources teardown. Added a test for
   `.spec.networking → nil` transitions confirming the internal Service
   survives.

All tests green (lint + test). CRD + DeepCopy regenerated via
`make manifests generate`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham merged commit 20a1a90 into main Apr 17, 2026
2 checks passed
bdchatham added a commit that referenced this pull request Apr 17, 2026
…only ports)

Picks up the post-#99 controller binary:
- new SeiNodeDeployment.status.internalService field
- per-deployment ClusterIP Service with the stateless HTTP port set
  (rpc/evm-http/rest only — evm-ws and grpc deliberately excluded)
- internal Service lifecycle is independent of .spec.networking

Unblocks the autobake workflow (platform repo, M2b) which reads
status.internalService.name to dial the chain's RPC.

Image:
  189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:20a1a9038725109d434f3940797200afaf75aa44
  sha256:05ee5a60d3541c10e0409086381284a1e1695aabd771a14b049de170e1ac0a37

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham added a commit that referenced this pull request Apr 17, 2026
…only ports) (#100)

Picks up the post-#99 controller binary:
- new SeiNodeDeployment.status.internalService field
- per-deployment ClusterIP Service with the stateless HTTP port set
  (rpc/evm-http/rest only — evm-ws and grpc deliberately excluded)
- internal Service lifecycle is independent of .spec.networking

Unblocks the autobake workflow (platform repo, M2b) which reads
status.internalService.name to dial the chain's RPC.

Image:
  189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:20a1a9038725109d434f3940797200afaf75aa44
  sha256:05ee5a60d3541c10e0409086381284a1e1695aabd771a14b049de170e1ac0a37

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham added a commit that referenced this pull request Apr 17, 2026
Controller pods crash on startup with:

  Failed to initialize OTel MeterProvider
  error: building OTel resource: conflicting Schema URL:
    https://opentelemetry.io/schemas/1.40.0 and
    https://opentelemetry.io/schemas/1.26.0

resource.Merge rejects merging resources whose schema URLs differ.
resource.Default() reports schema v1.40.0 (embedded in the SDK at
v1.43.0), while cmd/telemetry.go hardcoded semconv/v1.26.0 as the
schema for the custom resource overlay.

Bump the semconv import to v1.40.0 so the two schema URLs agree.
All three symbols in use here (semconv.SchemaURL, ServiceName,
ServiceVersion) are stable across semconv versions — drop-in
substitution.

Unblocks the controller image bump that #100 landed. Post-#99
controller pods stop CrashLoopBackOff and roll out cleanly, which
in turn unblocks SeiNodeDeployment.status.internalService for the
autobake workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham added a commit that referenced this pull request Apr 17, 2026
Controller pods crash on startup with:

  Failed to initialize OTel MeterProvider
  error: building OTel resource: conflicting Schema URL:
    https://opentelemetry.io/schemas/1.40.0 and
    https://opentelemetry.io/schemas/1.26.0

resource.Merge rejects merging resources whose schema URLs differ.
resource.Default() reports schema v1.40.0 (embedded in the SDK at
v1.43.0), while cmd/telemetry.go hardcoded semconv/v1.26.0 as the
schema for the custom resource overlay.

Bump the semconv import to v1.40.0 so the two schema URLs agree.
All three symbols in use here (semconv.SchemaURL, ServiceName,
ServiceVersion) are stable across semconv versions — drop-in
substitution.

Unblocks the controller image bump that #100 landed. Post-#99
controller pods stop CrashLoopBackOff and roll out cleanly, which
in turn unblocks SeiNodeDeployment.status.internalService for the
autobake workflow.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham added a commit that referenced this pull request Apr 17, 2026
Picks up the OTel schema-URL fix from #101, which unblocks the post-#99
controller image. Prior image (20a1a90) crashes at startup with:

  Failed to initialize OTel MeterProvider
  conflicting Schema URL: ...1.40.0 and ...1.26.0

Image:
  189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:d122d39d5863a391d879cee2abdab5808a631db3
  sha256:6a0a11bd2b135777d7bf4973f4009553b49a3cd4d2bfe41e08947e6a1780fde4

Post-Flux-sync verification:
  kubectl -n sei-k8s-controller-system get pods
  # expect: all pods Running on the new image, no CrashLoopBackOff

Unblocks autobake workflow (platform#101) which reads
status.internalService.name from the post-#99 controller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham added a commit that referenced this pull request Apr 17, 2026
Picks up the OTel schema-URL fix from #101, which unblocks the post-#99
controller image. Prior image (20a1a90) crashes at startup with:

  Failed to initialize OTel MeterProvider
  conflicting Schema URL: ...1.40.0 and ...1.26.0

Image:
  189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:d122d39d5863a391d879cee2abdab5808a631db3
  sha256:6a0a11bd2b135777d7bf4973f4009553b49a3cd4d2bfe41e08947e6a1780fde4

Post-Flux-sync verification:
  kubectl -n sei-k8s-controller-system get pods
  # expect: all pods Running on the new image, no CrashLoopBackOff

Unblocks autobake workflow (platform#101) which reads
status.internalService.name from the post-#99 controller.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant