feat(observability): add /api/metrics + opt-in OTLP metrics push#403
Draft
aryasaatvik wants to merge 1 commit intoRhysSullivan:mainfrom
Draft
feat(observability): add /api/metrics + opt-in OTLP metrics push#403aryasaatvik wants to merge 1 commit intoRhysSullivan:mainfrom
aryasaatvik wants to merge 1 commit intoRhysSullivan:mainfrom
Conversation
Wires two independent ways to get Effect's in-process metrics out — a pull endpoint that's always on locally and auth-gated on cloud, plus an opt-in OTLP push path that the self-host daemon enables via env var. ## What ships ### Pull: `GET /api/metrics` in `@executor/api` - New `MetricsApi` group + handler. Returns Prometheus text exposition format built from `Metric.unsafeSnapshot()` — hand-rolled serializer in `packages/core/api/src/metrics/prometheus.ts` (~170 lines, no new deps). Handles counters, gauges, histograms (cumulative bucket form + `+Inf` + `_count` + `_sum`), summaries, and frequencies. - Registered in `CoreExecutorApi` + `CoreHandlers`, so both `apps/local` and `apps/cloud` expose the endpoint. Local mounts unconditionally (self-host operator can scrape); cloud inherits the `OrgAuth` middleware so each org only sees their own metrics. ### Push: OTLP metrics layer in `apps/local` - `apps/local/src/server/telemetry.ts` wires `@effect/opentelemetry/OtlpMetrics.layer` behind `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` (OTel-standard env var). Absent = no push, no cost. Auth via comma-separated key=value pairs in `OTEL_EXPORTER_OTLP_METRICS_HEADERS` — Axiom users set `Authorization=Bearer xxx,X-Axiom-Dataset=executor-local`. - Installs in a module-scope `ManagedRuntime` so the exporter's `exportInterval` timer keeps ticking across requests. Per-request scoping would shut the exporter down before the first batch leaves. - Booted once from `createServerHandlers` via `startMetricsExport()`. ### Cloud push path: deliberately not included Cloud already has tracing via `@microlabs/otel-cf-workers`. Metrics push would add cost + cardinality risk that nobody's asked for. The pull endpoint (auth-gated) is there for ops teams who want to scrape; OTLP push can be added later when there's a concrete need. ## Dependency alignment - `@effect/opentelemetry@^0.63.0` added to `apps/local` (already in the workspace via `apps/cloud`; no new package fetched). - No removals from main's dep graph. - No changes to `apps/cloud` deps or wiring. ## Test plan - [x] `bun x vitest run` in `@executor/api` — 8/8 (4 new prometheus serializer tests + 4 existing). - [x] `bun x vitest run` in `@executor/sdk` — 90/90. - [x] `bun x vitest run` in `@executor/hosts/mcp` — 23/23. - [x] `bun x tsc --noEmit` in `@executor/api`, `apps/local`, `apps/cloud` — all clean. - [ ] Dev-server smoke (reviewer): start the daemon, `curl localhost:4788/api/metrics` — verify Prometheus-format output. Optionally set `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` + headers, trigger some executions, verify metrics appear in the backend. ## Follow-ups - Dashboards for Axiom / Grafana / Prometheus. Not blocking this PR. - Cardinality guardrails if cloud push path is ever added.
d0a8fed to
0c61034
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack
Independent of the execution-history stack. Can land in any order.
Summary
Two complementary ways to get Effect's in-process metrics out — one always-on (pull), one opt-in (push). Addresses the "metrics collected but never exported" gap. Ships with zero new external dependencies — reuses `@effect/opentelemetry` which is already in the workspace.
What ships
Pull: `GET /api/metrics` in `@executor/api`
Push: OTLP metrics layer in `apps/local` only
Cloud push path: deliberately not included
Cloud already has tracing via `@microlabs/otel-cf-workers`. Metrics push adds Axiom ingestion cost + cardinality risk that nobody's asked for. The pull endpoint (auth-gated) is there for ops who want to scrape; OTLP push is a follow-up if concretely needed.
Cardinality policy
Full `mcp.tool.name` dimensions kept (namespace + operation, e.g. `github.repos.get`). The primary consumer is self-host / local where cardinality is bounded by the operator's own plugin set. Any cloud-side cap is a follow-up.
Dependency alignment
Why Prometheus format for the pull endpoint
Test plan
Follow-ups (out of scope)