azure-monitor-opentelemetry-exporter: oversized gen_ai content truncates customDimensions and drops token-usage dimensions

- **Package Name**: azure-monitor-opentelemetry-exporter
- **Package Version**: 1.0.0b52
- **Operating System**: macOS 15.5 (arm64)
- **Python Version**: 3.12.13

**Describe the bug**
When a span carries large GenAI semantic-convention content (e.g. `gen_ai.input.messages`), the queryable `gen_ai.usage.input_tokens` / `gen_ai.usage.output_tokens` dimensions are silently lo
st on that span in Application Insights, so token usage is undercounted/misreported.

The exporter forwards GenAI content attributes up to a **256 KB** limit (the GenAI-attribute exemption in `_filter_custom_properties`, `azure/monitor/opentelemetry/exporter/_utils.py`). But t
he ingested row's `customDimensions` is truncated at ~**64 KB** (we consistently observe `strlen(tostring(customDimensions)) == 65532`). Because the truncation cuts the serialized property ba
g at that boundary, any attribute serialized **after** the large content value — including the tiny but high-value `gen_ai.usage.*` — is dropped and becomes non-queryable.

**To Reproduce**
1. Send one large and one small GenAI span through the raw `AzureMonitorTraceExporter`:
```
import os
from azure.monitor.opentelemetry.exporter import AzureMonitorTraceExporter
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
provider = TracerProvider(resource=Resource.create({"service.name": "genai-repro"}))
provider.add_span_processor(
    BatchSpanProcessor(
        AzureMonitorTraceExporter.from_connection_string(os.environ["APPLICATIONINSIGHTS_CONNECTION_STRING"])
    )
)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("repro")
for name, content, in_tok in (
    ("chat repro-big", "[" + "x" * 71_000 + "]", 71_000),   # ~71 KB content
    ("chat repro-small", '[{"role":"user","content":"hi"}]', 5),
):
    with tracer.start_as_current_span(name) as span:
        span.set_attribute("gen_ai.system", "openai")
        span.set_attribute("gen_ai.request.model", "gpt-4o")
        span.set_attribute("gen_ai.input.messages", content)
        span.set_attribute("gen_ai.usage.input_tokens", in_tok)
        span.set_attribute("gen_ai.usage.output_tokens", 7)
provider.force_flush()
provider.shutdown()
```

2. Wait ~2 minutes for ingestion, then run in Logs:

   ```kusto
   dependencies
   | where timestamp > ago(30m) and name startswith "chat repro"
   | extend cd = tostring(customDimensions)
   | project name,
             usage_tokens_queryable = tostring(customDimensions["gen_ai.usage.input_tokens"]),
             input_msgs_len         = strlen(tostring(customDimensions["gen_ai.input.messages"])),
             customDimensions_len   = strlen(cd)
   | order by name asc
   ```

3. Observe that the large span is present (confirmable by `operation_Id`) but its token dimension is gone:

   | name | usage_tokens_queryable | input_msgs_len | customDimensions_len |
   |---|---|---|---|
   | `chat repro-big` | *(empty)* | ~64000 | **65532** |
   | `chat repro-small` | `5` | ~30 | ~200 |

**Expected behavior**
A span exceeding the ingestion limit should not silently lose its small, high-value dimensions. `customDimensions` should stay valid JSON with all properties queryable — truncate the oversized value (`gen_ai.input.messages`), not clip the whole blob into a string that drops the other dimensions



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

azure-monitor-opentelemetry-exporter: oversized gen_ai content truncates customDimensions and drops token-usage dimensions #47345

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

name	usage_tokens_queryable	input_msgs_len	customDimensions_len
`chat repro-big`	(empty)	~64000	65532
`chat repro-small`	`5`	~30	~200

azure-monitor-opentelemetry-exporter: oversized gen_ai content truncates customDimensions and drops token-usage dimensions #47345

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions