Skip to content

feat(observability): add /api/metrics + opt-in OTLP metrics push#9

Draft
aryasaatvik wants to merge 1 commit intodevfrom
feat/otlp-http-observability
Draft

feat(observability): add /api/metrics + opt-in OTLP metrics push#9
aryasaatvik wants to merge 1 commit intodevfrom
feat/otlp-http-observability

Conversation

@aryasaatvik
Copy link
Copy Markdown
Owner

Stacked review copy of upstream RhysSullivan#403 · independent of the execution-history stack (merges directly into dev).


See upstream PR #403 for full description.

Wires two independent ways to get Effect's in-process metrics out —
a pull endpoint that's always on locally and auth-gated on cloud, plus
an opt-in OTLP push path that the self-host daemon enables via env var.

## What ships

### Pull: `GET /api/metrics` in `@executor/api`

- New `MetricsApi` group + handler. Returns Prometheus text exposition
  format built from `Metric.unsafeSnapshot()` — hand-rolled serializer
  in `packages/core/api/src/metrics/prometheus.ts` (~170 lines, no
  new deps). Handles counters, gauges, histograms (cumulative bucket
  form + `+Inf` + `_count` + `_sum`), summaries, and frequencies.
- Registered in `CoreExecutorApi` + `CoreHandlers`, so both
  `apps/local` and `apps/cloud` expose the endpoint. Local mounts
  unconditionally (self-host operator can scrape); cloud inherits the
  `OrgAuth` middleware so each org only sees their own metrics.

### Push: OTLP metrics layer in `apps/local`

- `apps/local/src/server/telemetry.ts` wires
  `@effect/opentelemetry/OtlpMetrics.layer` behind
  `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` (OTel-standard env var). Absent =
  no push, no cost. Auth via comma-separated key=value pairs in
  `OTEL_EXPORTER_OTLP_METRICS_HEADERS` — Axiom users set
  `Authorization=Bearer xxx,X-Axiom-Dataset=executor-local`.
- Installs in a module-scope `ManagedRuntime` so the exporter's
  `exportInterval` timer keeps ticking across requests. Per-request
  scoping would shut the exporter down before the first batch leaves.
- Booted once from `createServerHandlers` via `startMetricsExport()`.

### Cloud push path: deliberately not included

Cloud already has tracing via `@microlabs/otel-cf-workers`. Metrics
push would add cost + cardinality risk that nobody's asked for. The
pull endpoint (auth-gated) is there for ops teams who want to scrape;
OTLP push can be added later when there's a concrete need.

## Dependency alignment

- `@effect/opentelemetry@^0.63.0` added to `apps/local` (already in
  the workspace via `apps/cloud`; no new package fetched).
- No removals from main's dep graph.
- No changes to `apps/cloud` deps or wiring.

## Test plan

- [x] `bun x vitest run` in `@executor/api` — 8/8 (4 new prometheus
  serializer tests + 4 existing).
- [x] `bun x vitest run` in `@executor/sdk` — 90/90.
- [x] `bun x vitest run` in `@executor/hosts/mcp` — 23/23.
- [x] `bun x tsc --noEmit` in `@executor/api`, `apps/local`,
  `apps/cloud` — all clean.
- [ ] Dev-server smoke (reviewer): start the daemon, `curl
  localhost:4788/api/metrics` — verify Prometheus-format output.
  Optionally set `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` + headers,
  trigger some executions, verify metrics appear in the backend.

## Follow-ups

- Dashboards for Axiom / Grafana / Prometheus. Not blocking this PR.
- Cardinality guardrails if cloud push path is ever added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant