fix(langchain): handle Anthropic cache_creation nested-dict in _parse_usage_model by ASAD-BE18 · Pull Request #1698 · langfuse/langfuse-python

ASAD-BE18 · 2026-06-10T09:01:08Z

What changed

_parse_usage_model in langfuse/langchain/CallbackHandler.py now handles Anthropic's cache_creation field when it arrives as a nested dict (tiered prompt-caching API, langchain-anthropic ≥ 0.3.x).

Root cause

Anthropic's extended prompt-caching API includes a cache_creation key in the LLM output usage dict:

{
    "input_tokens": 9454,
    "output_tokens": 380,
    "cache_read_input_tokens": 0,
    "cache_creation": {
        "ephemeral_1h_input_tokens": 500,
        "ephemeral_5m_input_tokens": 0,
    },
}

v2.x (≤ 2.60.10) — The old final filter (v is not None and not isinstance(v, str)) let the nested dict pass through into UpdateGenerationBody.usageDetails, which is typed as Union[Dict[str, int], ...]. Pydantic raised a ValidationError ("value is not a valid integer") that was swallowed by the ingestion queue, silently dropping the entire generation end() event — every trace had endTime=null and input=0 / output=0.

v4.x (HEAD before this PR) — The isinstance(v, int) filter correctly prevented the crash, but the cache_creation dict was silently discarded, losing all cache-creation token data.

Fix

Before the final isinstance(v, int) filter, pop cache_creation and flatten each per-tier value into a cache_creation_{tier_key} entry (matching the pattern already used for input_token_details). If the total is non-zero, also set cache_creation_input_tokens via setdefault so the legacy scalar field is not overwritten if already present.

# Anthropic extended prompt caching: cache_creation is a dict keyed by cache tier.
if "cache_creation" in usage_model and isinstance(usage_model["cache_creation"], dict):
    cache_creation = usage_model.pop("cache_creation")
    total = 0
    for tier_key, tier_val in cache_creation.items():
        if isinstance(tier_val, int):
            usage_model[f"cache_creation_{tier_key}"] = tier_val
            total += tier_val
    if total > 0:
        usage_model.setdefault("cache_creation_input_tokens", total)

Tests added (`tests/unit/test_parse_usage_model.py`)

Test	What it verifies
`test_anthropic_cache_creation_dict_flattened`	Per-tier keys present; aggregate = sum of tiers; original dict key absent
`test_anthropic_cache_creation_all_zeros_no_aggregate`	Zero-value tiers stored; no aggregate added when total is zero
`test_anthropic_cache_creation_legacy_field_not_overwritten`	`setdefault` preserves existing `cache_creation_input_tokens` value

Existing test_standard_tier_input_token_details and test_priority_tier_not_subtracted still pass unchanged.

Verification

uv run --frozen pytest tests/unit/test_parse_usage_model.py -v
uv run --frozen ruff check langfuse/langchain/CallbackHandler.py tests/unit/test_parse_usage_model.py
uv run --frozen ruff format --check langfuse/langchain/CallbackHandler.py tests/unit/test_parse_usage_model.py

All pass. No live-provider tests are needed since the fix is unit-testable.

Greptile Summary

This PR fixes _parse_usage_model in langfuse/langchain/CallbackHandler.py to correctly handle the Anthropic extended prompt-caching API where cache_creation arrives as a nested dict of per-tier token counts rather than a scalar integer.

The new block pops cache_creation when it is a dict, flattens each tier into a cache_creation_{tier_key} key, and uses setdefault to set the legacy aggregate cache_creation_input_tokens only when the total is non-zero, preserving any pre-existing scalar value.
Three focused unit tests cover the flattening, the all-zeros case, and the legacy-field preservation case.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to a single parsing helper, all existing tests still pass, and three new unit tests cover the key scenarios introduced by this fix.

The fix is a well-contained addition inside an already-guarded if isinstance(usage_model, dict) block, the logic is straightforward dict-flattening with no side effects on other fields, and the setdefault correctly prevents clobbering an existing scalar. The one theoretical gap — a non-integer tier value being silently skipped — mirrors how the rest of the function already handles unexpected types via the final isinstance(v, int) filter.

No files require special attention.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_parse_usage_model called] --> B{has __dict__?}
    B -- yes --> C[usage equals usage.__dict__]
    B -- no --> D[usage_model equals usage.copy]
    C --> D
    D --> E{OpenAI format?}
    E -- yes --> F[return usage_model as-is]
    E -- no --> G[Apply conversion_list key renames]
    G --> H[Flatten input_token_details]
    H --> I[Flatten output_token_details]
    I --> J[Flatten Vertex AI token details]
    J --> N{cache_creation present AND is dict?}
    N -- no --> P
    N -- yes --> O[Pop cache_creation dict and flatten tier keys]
    O --> O2{total greater than 0?}
    O2 -- yes --> O3[setdefault cache_creation_input_tokens equals total]
    O2 -- no --> P
    O3 --> P[Final filter: keep only int values]
    P --> Q{usage_model empty?}
    Q -- yes --> R[return None]
    Q -- no --> S[return usage_model]

_{Reviews (1): Last reviewed commit: "fix(langchain): handle Anthropic cache_c..." | Re-trigger Greptile}

…_usage_model Anthropic's extended prompt-caching API returns cache_creation as a nested dict keyed by cache tier (e.g. {"ephemeral_1h_input_tokens": 500, "ephemeral_5m_input_tokens": 0}) inside the LLM output usage object. Previously this value was passed through to UpdateGenerationBody.usageDetails unchanged. On v2.x (final filter: not None/not str) the nested dict survived, causing a Pydantic ValidationError ("value is not a valid integer") that was silently swallowed by the ingestion queue — dropping the entire generation end() event and leaving every trace with endTime=null and input/output=0. On v4.x (final filter: isinstance(v, int)) the nested dict was silently discarded, losing all cache-creation token data. Fix: before the final int-filter, pop cache_creation, flatten each tier value into a cache_creation_{tier_key} entry, and set the aggregated cache_creation_input_tokens total via setdefault so the legacy field is not overwritten if already present. Closes langfuse#1697

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

CLAassistant · 2026-06-10T09:01:15Z

All committers have signed the CLA.

hassiebp · 2026-06-12T09:45:08Z

@claude review

claude

LGTM — small, well-scoped fix that flattens Anthropic's nested cache_creation dict; production logic is correct and the inline nit is a test-quality improvement, not a blocker.

Extended reasoning...

Overview

This PR adds a small block inside _parse_usage_model in langfuse/langchain/CallbackHandler.py to handle Anthropic's extended prompt-caching API, where cache_creation arrives as a nested dict (per-tier token counts) rather than a scalar. The block pops the dict, flattens each tier into cache_creation_{tier_key} (mirroring the existing input_token_details pattern), and uses setdefault to expose an aggregate cache_creation_input_tokens without clobbering a pre-existing scalar. Three new unit tests cover the flatten case, all-zeros case, and legacy-field preservation; existing tier tests are unchanged.

Security risks

None. The change is a pure dict-transformation with isinstance guards; no I/O, auth, crypto, or untrusted-input parsing surface is touched.

Level of scrutiny

Low. This is a self-contained parsing helper change behind isinstance(usage_model, dict) and isinstance(usage_model["cache_creation"], dict) guards, with isinstance(int) per-tier filtering that matches the function's existing defensive style. The production logic correctly handles the three relevant cases (flatten, all-zeros aggregate suppression, legacy-field preservation).

Other factors

The single finding is a nit about test_anthropic_cache_creation_legacy_field_not_overwritten — the legacy value (300) equals the tier sum (300), so the test would also pass under a regressed plain-assignment implementation. The production code is correct; this is purely a test-discrimination improvement worth picking up but not blocking. The maintainer explicitly requested review and there are no outstanding human-reviewer comments.

The legacy value (300) equalled the tier sum (200+100=300), so the assertion would pass even if setdefault were replaced with a plain assignment. Use 999 to make the test discriminating.

claude Bot reviewed Jun 10, 2026

View reviewed changes

Merge branch 'main' into fix/anthropic-cache-creation-usage

348a4ab

claude Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread tests/unit/test_parse_usage_model.py

ASAD-BE18 added 4 commits June 12, 2026 14:56

Merge branch 'main' into fix/anthropic-cache-creation-usage

7ce5654

test(langchain): use distinct legacy value in setdefault test

fe3af4d

The legacy value (300) equalled the tier sum (200+100=300), so the assertion would pass even if setdefault were replaced with a plain assignment. Use 999 to make the test discriminating.

Merge branch 'main' into fix/anthropic-cache-creation-usage

d2d1432

Merge branch 'main' into fix/anthropic-cache-creation-usage

830e877

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(langchain): handle Anthropic cache_creation nested-dict in _parse_usage_model#1698

fix(langchain): handle Anthropic cache_creation nested-dict in _parse_usage_model#1698
ASAD-BE18 wants to merge 6 commits into
langfuse:mainfrom
ASAD-BE18:fix/anthropic-cache-creation-usage

ASAD-BE18 commented Jun 10, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

CLAassistant commented Jun 10, 2026 •

edited

Loading

Uh oh!

hassiebp commented Jun 12, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ASAD-BE18 commented Jun 10, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Root cause

Fix

Tests added (tests/unit/test_parse_usage_model.py)

Verification

Greptile Summary

Confidence Score: 5/5

Flowchart

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

CLAassistant commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hassiebp commented Jun 12, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ASAD-BE18 commented Jun 10, 2026 •

edited by greptile-apps Bot

Loading

Tests added (`tests/unit/test_parse_usage_model.py`)

CLAassistant commented Jun 10, 2026 •

edited

Loading