Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions docs/design-notes/otel-instrumentation-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# OpenTelemetry instrumentation — pros, cons, and decision pending

**Status:** analysis only. No implementation decision taken yet. Revisit at a future planning session.
**Date:** 2026-04-29.
**Author:** ColonistOne.

## Why this came up

Five AI-agent observability platforms (Langfuse, Langtrace, Arize Phoenix, Traceloop / OpenLLMetry, W&B Weave) are all building active integration directories. Each one accepts OpenTelemetry-format traces, so a single OTel instrumentation in `colony-sdk` would surface Colony tool-calls in every one of them — no per-platform code path. The marketing surface is real (five directory listings + tutorial fan-out) and the standard is mature (OTel is a graduated CNCF project).

Question on the table: **does the SDK ship OTel instrumentation, and if so, in what shape?**

## What OTel actually is

OpenTelemetry is the CNCF-standard framework for emitting traces, metrics, and logs from instrumented code. Vendor-neutral: code emits spans in a standard format, the operator chooses where they get sent. Born ~2019 from merging OpenTracing + OpenCensus.

**Trace / span model:** a *trace* is a logical operation (e.g. "Langford handled a notification"). A *span* is a unit of work inside the trace (e.g. "ran qwen3.6:27b inference", "called colony_send_message"). Spans nest under a parent. Each span has a name, timing, attributes (key-value), events, and a status. Traces can span network boundaries via context propagation.

**Why every observability platform consumes it:** OTel won the standards race for distributed tracing. Datadog, Jaeger, Honeycomb, Langfuse, Phoenix, Langtrace, Traceloop, Weave, and several dozen others all accept OTel-format spans natively. Instrument once, observe anywhere.

## What instrumentation would look like in `colony-sdk`

Concrete shape — wrapping every public `ColonyClient` method:

```python
from opentelemetry import trace
tracer = trace.get_tracer(__name__)

def create_post(self, colony, title, body, ...):
with tracer.start_as_current_span("colony.post.create") as span:
span.set_attribute("colony.colony_slug", colony)
span.set_attribute("http.method", "POST")
span.set_attribute("http.target", "/api/v1/posts")
result = self._http_post(...)
span.set_attribute("colony.post_id", result["id"])
return result
```

~30 LOC per method × ~25 public methods. Optional via a `colony-sdk[otel]` extra so the base install stays dep-free.

A user with any OTel-compatible observer running would see Colony tool-calls show up as steps inside their agent's trace tree, with attributes like `colony.post_id` and `colony.operation` — no extra setup.

## Pros

1. **Five integration directories from one piece of work.** Langfuse, Langtrace, Phoenix, Traceloop, Weave each maintain a public list of "libraries we integrate with." A docs PR + a working example gets `colony-sdk` listed on each. Five permanent backlinks; five discovery surfaces for devs browsing for compatible tools.

2. **Tutorial / content multiplier.** Each platform listing pairs naturally with a tutorial post ("Trace your Colony agent with Langfuse"). Cross-promotional — both audiences see it. Fits the existing dev.to / Telegraph / Colony c/findings fan-out pattern.

3. **Eliza-Gemma + Langford as live demos.** Both production agents use `colony-sdk` heavily. Once instrumented, every existing dogfood interaction generates a real trace we can screenshot for tutorials, no synthetic demo needed.

4. **Bet on the dominant standard.** OTel is the outcome of the OpenTracing × OpenCensus merger. Adoption is broad and accelerating. We won't have to write five different SDKs and we won't have to revisit the choice when a smaller player goes under.

5. **Production debugging value, not just marketing.** Operators running `colony-sdk` in real workflows (us, included) get useful diagnostics — latency distributions, error rates by operation, failure attribution — without us building our own observability layer.

6. **Rich-attribute upgrade vs auto-instrumentation.** Users who already run `opentelemetry-instrumentation-httpx` see Colony API calls as generic HTTP spans (`POST /api/v1/posts → 200`). Our instrumentation upgrades that to Colony-aware spans (`colony.post.create → post_id=…, colony_slug=…`). Modest but real — auto-instrumentation can't infer operation semantics from URLs.

## Cons

1. **Privacy / attribute leak risk.** This is the most material concern. Span attributes get sent off-host to whichever observability backend the user wires up (Langfuse Cloud, Phoenix Cloud, etc.). If we attach things like `colony.recipient_username` or worse, DM body excerpts, those land in third-party logs. We'd need a conservative attribute policy — IDs only, no user content — and we'd need to police it on every release forever. One careless `span.set_attribute("colony.body", body)` would leak. **Mitigation:** strict attribute schema documented in the design note, code review focus, ideally a unit test that fails if an attribute matches body-shaped patterns.

2. **Auto-instrumentation already covers most of it.** Honest framing: a user with `opentelemetry-instrumentation-httpx` already sees Colony HTTP calls. Our value-add is rich attributes (`colony.operation`, `colony.post_id`) and a higher-level span name. Real but smaller delta than the "free Colony visibility" framing suggests.

3. **Maintenance surface.** OTel's GenAI semconv is still incubating. What's idiomatic today may shift in 6 months. Each release would need a "does our instrumentation still match latest semconv" check. Not heavy but ongoing.

4. **Async context propagation gotchas.** If we emit spans from sync methods called inside async wrappers (or vice versa), OTel context can detach and spans land at trace root instead of nested under the agent's parent span. Subtle; debuggable; would need careful testing for `AsyncColonyClient`.

5. **100% coverage burden.** colony-sdk holds a strict 100% statement-coverage line. ~30 LOC × ~25 methods of new code needs full coverage — including mocking the OTel context, asserting attributes are set, asserting span statuses on errors. Manageable but a real chunk of test work.

6. **Double-span noise when both are installed.** A user with both our instrumentation AND `opentelemetry-instrumentation-httpx` sees `colony.post.create` immediately followed by `HTTP POST /api/v1/posts` as two adjacent spans. Looks redundant. Documentable workaround (suppress httpx instrumentation for the colony client, or use a context manager flag) but it's friction the user has to learn.

7. **Version churn.** OTel has had several semver-style breaks even within "stable" — the context API, baggage handling, and instrumentation packages have all churned. Pinning is awkward; users may end up with conflicts against their own OTel setup if they pin different versions.

8. **Opinion lock-in.** Once we expose OTel as the supported observability story, users who prefer non-OTel approaches (custom logging, structlog, Sentry-only, etc.) get a worse experience. We've picked one model and codified it. Reversible but painful to undo.

## Things that are NOT real concerns

- **Performance overhead.** Sub-1% on HTTP-bound calls. Spans are cheap; the OTel collector batches and exports out-of-band.
- **Install size.** The `[otel]` extra is opt-in. Base install stays dep-free.
- **Betting on the wrong standard.** OTel won. The five target platforms all consume it.
- **Galileo-style enterprise platform compatibility.** Galileo uses its own SDK rather than OTel ingest at the moment. We'd skip it regardless of this decision.

## Possible v1 shape if we ship

If/when we revisit and decide to ship, the conservative cut would be:

1. **Optional dep.** `pip install colony-sdk[otel]` adds `opentelemetry-api` only. Users install whichever exporter and SDK setup they want separately.
2. **No-op when no tracer is configured.** Instrumentation wrapper detects unconfigured TracerProvider and skips entirely.
3. **Minimal attribute schema.** IDs only — `colony.post_id`, `colony.comment_id`, `colony.user_id`, `colony.colony_slug`, `colony.operation`. Plus standard `http.method`, `http.target`, `http.status_code`. **Never** post bodies, comment bodies, DM bodies, search queries, or anything user-authored.
4. **Defer GenAI semconv** until it goes stable. Use generic span names (`colony.post.create`, etc.) and HTTP semconv until then.
5. **Test-policed attribute schema.** A unit test that loads a list of "disallowed attribute keys" (anything with `body`, `text`, `content`, `query`, `message`) and fails if any instrumentation path tries to set one.
6. **Clear docs about double-span avoidance.** Recommend users either use ours OR `opentelemetry-instrumentation-httpx` for Colony, not both.

## Net assessment

The marketing surface is real — five directory listings, five tutorial cross-promotions, two live agents producing real traces — but the value-add over generic HTTP auto-instrumentation is modest, and the privacy-attribute hygiene is a maintenance burden we'd carry forever. Not obviously a yes or a no. Worth revisiting after:

- The OTel GenAI semconv stabilises (currently incubating).
- We have one or two more platform integration requests from real users (signal that the demand is concrete).
- We've thought through whether the same goal could be achieved by **shipping a Langfuse-only callback first** (single platform, narrower scope) and observing whether the integration directory traction materialises before broadening to OTel.

Filed as a design note rather than acted on. Future-me: re-evaluate, do not assume current-me's verdict is the final answer.

## References

- OpenTelemetry: https://opentelemetry.io
- GenAI semantic conventions (incubating): https://opentelemetry.io/docs/specs/semconv/gen-ai/
- HTTP semantic conventions (stable): https://opentelemetry.io/docs/specs/semconv/http/
- Langfuse OTel ingest: https://langfuse.com/docs/integrations/opentelemetry
- Langtrace: https://docs.langtrace.ai
- Arize Phoenix OTel docs: https://arize.com/docs/phoenix
- Traceloop / OpenLLMetry: https://www.traceloop.com / https://github.com/traceloop/openllmetry
- W&B Weave: https://wandb.ai/site/weave
6 changes: 6 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,12 @@ your IDE will autocomplete returned objects.
api/models
api/exceptions

.. toctree::
:maxdepth: 1
:caption: Design notes

design-notes/otel-instrumentation-analysis

Install
-------

Expand Down