Peer config that resolves to zero instances should degrade gracefully, not block state-sync

## Problem

When a SeiNode peer entry (e.g. an `ec2Tags` discovery block) resolves to **zero** matching instances — e.g. a region whose EC2 validators have been decommissioned/terminated — a freshly-deployed validator pod fails to complete state-sync and never enters the signing set.

Observed on arctic-1 **validator-11**: its SeiNode spec listed
```yaml
- ec2Tags:
    region: eu-central-1
    tags:
      ChainIdentifier: arctic-1
      Component: validators
```
but all eu-central-1 EC2 validators had been terminated (migrated to K8s), so that source resolved to nothing. The pod sat `Initializing` / not signing for **25+ min** where sibling validators (12–19) came up in ~5–7 min. Deleting the dead eu-central-1 peer block and redeploying fixed it — synced + signing in **~2 min**.

## Impact

Latent landmine across the migrated cohort: arctic-1 **validator-12 through -19 all still carry the same eu-central-1 `ec2Tags` peer entry**. They're signing now only because they bootstrapped while eu-central-1 still had instances. Any pod restart or redeploy puts them through the same fresh-sync path that hung v11 → a routine restart of any of 8 live validators risks a stuck, non-signing node (and eventual downtime re-jail). The dead-peer manifest cleanup is the interim mitigation; this issue is the durable fix.

## Proposed approach

Peer/witness resolution (DiscoverPeers / the seictl peer-discovery + ConfigureStateSync witness computation) should treat a peer source that resolves to zero endpoints as a no-op with a warning, not let it produce a broken/empty witness or persistent-peers set that wedges state-sync:

- (a) when an `ec2Tags` (or any) peer source returns 0 instances, **log + skip it** and continue with the remaining sources;
- (b) only fail hard if **all** sources resolve to zero **and** state-sync truly has no witnesses;
- (c) surface a condition/event ("peer source X resolved to 0") so an operator can see it rather than the pod silently hanging.

## Out of scope

- The manifest-side cleanup of removing the dead eu-central-1 `ec2Tags` entry from validator-12..19 (interim platform-repo change, tracked separately).
- Broader peer-drift automation.

## Relevant experts

- **kubernetes-specialist** — DiscoverPeers SeiNodeTask + reconcile
- **platform-engineer** — seictl sidecar peer-discovery + ConfigureStateSync
- **sei-network-specialist** — CometBFT state-sync witness/light-client semantics (why zero/dead witnesses wedge sync)

## References

- arctic-1 validator-11 incident 2026-06-10 — platform PRs #930 (delete dead peer + comment out), #931 (re-add for clean redeploy)
- Prior work: PR #386 (DiscoverPeers), #870 / #104 (ConfigureStateSync internal-RPC witnesses)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peer config that resolves to zero instances should degrade gracefully, not block state-sync #393

Problem

Impact

Proposed approach

Out of scope

Relevant experts

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Peer config that resolves to zero instances should degrade gracefully, not block state-sync #393

Description

Problem

Impact

Proposed approach

Out of scope

Relevant experts

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions