Skip to content

fix: align semconv import with OTel SDK schema URL (controller CrashLoop)#101

Merged
bdchatham merged 1 commit intomainfrom
fix/otel-semconv-schema-url
Apr 17, 2026
Merged

fix: align semconv import with OTel SDK schema URL (controller CrashLoop)#101
bdchatham merged 1 commit intomainfrom
fix/otel-semconv-schema-url

Conversation

@bdchatham
Copy link
Copy Markdown
Collaborator

@bdchatham bdchatham commented Apr 17, 2026

TL;DR

Controller pods crash on startup after the #100 image bump. Root cause is a semconv schema-URL mismatch from the OTel work in #97/#98, not #99. One-character import bump fixes it; verified locally.

Crash

{"level":"error","ts":"...","logger":"setup",
 "msg":"Failed to initialize OTel MeterProvider",
 "error":"building OTel resource: conflicting Schema URL:
         https://opentelemetry.io/schemas/1.40.0 and
         https://opentelemetry.io/schemas/1.26.0",
 "stacktrace":"main.main\n\t/workspace/cmd/main.go:107\n..."}

Root cause

resource.Merge(resource.Default(), resource.NewWithAttributes(semconv.SchemaURL, ...)) rejects merging resources with differing non-empty schema URLs.

  • resource.Default() reports schema URL https://opentelemetry.io/schemas/1.40.0 — the schema embedded in the OTel SDK (we're on go.opentelemetry.io/otel/sdk v1.43.0).
  • cmd/telemetry.go:14 hardcoded semconv "go.opentelemetry.io/otel/semconv/v1.26.0" which contributed https://opentelemetry.io/schemas/1.26.0.

Bumping to semconv/v1.40.0 aligns the two schema URLs. All three symbols used (semconv.SchemaURL, ServiceName, ServiceVersion) are stable across semconv versions — drop-in.

Diff

-	semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
+	semconv "go.opentelemetry.io/otel/semconv/v1.40.0"

No go.mod / go.sum churn — semconv subpackages are siblings under the otel module, already resolved.

Verified locally

  • go mod tidy — no changes.
  • go build ./... — clean.
  • make lint — 0 issues.
  • make test — green.

Current prod impact

  • Deployment has 4 pods instead of 3 (1 CrashLooping new pod + 3 healthy old pods on pre-bump image). Rolling update stuck at maxUnavailable=0.
  • Existing chains (pacific-1, atlantic-2, arctic-1) are unaffected — old pods continue reconciling.
  • Autobake M2b is blocked — old pods don't populate .status.internalService.

Apply plan

  1. Merge this PR.
  2. ecr.yml workflow auto-publishes the new image on push to main.
  3. Open a chore/bump PR bumping config/manager/manager.yaml to the new SHA. (I'm happy to do that one-liner once this lands.)
  4. Flux reconciles → new pods roll out cleanly → old pods terminate.

🤖 Generated with Claude Code

Controller pods crash on startup with:

  Failed to initialize OTel MeterProvider
  error: building OTel resource: conflicting Schema URL:
    https://opentelemetry.io/schemas/1.40.0 and
    https://opentelemetry.io/schemas/1.26.0

resource.Merge rejects merging resources whose schema URLs differ.
resource.Default() reports schema v1.40.0 (embedded in the SDK at
v1.43.0), while cmd/telemetry.go hardcoded semconv/v1.26.0 as the
schema for the custom resource overlay.

Bump the semconv import to v1.40.0 so the two schema URLs agree.
All three symbols in use here (semconv.SchemaURL, ServiceName,
ServiceVersion) are stable across semconv versions — drop-in
substitution.

Unblocks the controller image bump that #100 landed. Post-#99
controller pods stop CrashLoopBackOff and roll out cleanly, which
in turn unblocks SeiNodeDeployment.status.internalService for the
autobake workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham merged commit d122d39 into main Apr 17, 2026
2 checks passed
bdchatham added a commit that referenced this pull request Apr 17, 2026
Picks up the OTel schema-URL fix from #101, which unblocks the post-#99
controller image. Prior image (20a1a90) crashes at startup with:

  Failed to initialize OTel MeterProvider
  conflicting Schema URL: ...1.40.0 and ...1.26.0

Image:
  189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:d122d39d5863a391d879cee2abdab5808a631db3
  sha256:6a0a11bd2b135777d7bf4973f4009553b49a3cd4d2bfe41e08947e6a1780fde4

Post-Flux-sync verification:
  kubectl -n sei-k8s-controller-system get pods
  # expect: all pods Running on the new image, no CrashLoopBackOff

Unblocks autobake workflow (platform#101) which reads
status.internalService.name from the post-#99 controller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bdchatham added a commit that referenced this pull request Apr 17, 2026
Picks up the OTel schema-URL fix from #101, which unblocks the post-#99
controller image. Prior image (20a1a90) crashes at startup with:

  Failed to initialize OTel MeterProvider
  conflicting Schema URL: ...1.40.0 and ...1.26.0

Image:
  189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:d122d39d5863a391d879cee2abdab5808a631db3
  sha256:6a0a11bd2b135777d7bf4973f4009553b49a3cd4d2bfe41e08947e6a1780fde4

Post-Flux-sync verification:
  kubectl -n sei-k8s-controller-system get pods
  # expect: all pods Running on the new image, no CrashLoopBackOff

Unblocks autobake workflow (platform#101) which reads
status.internalService.name from the post-#99 controller.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant