feat(ai-monitoring): Fetch model context size and rename task to fetch_ai_model_info by constantinius · Pull Request #112656 · getsentry/sentry

constantinius · 2026-04-10T10:01:31Z

Closes https://linear.app/getsentry/issue/TET-2219/sentry-map-llm-context-size-to-relay-cost-calculation-config

Extend the fetch_ai_model_costs task to also fetch context size (context window length) for each AI model alongside token costs. Context size is sourced from OpenRouter's context_length field and models.dev's limit.context field, following the same precedence logic as costs (OpenRouter takes priority).

The task is renamed from fetch_ai_model_costs to fetch_ai_model_info since it now fetches more than just cost data. The AIModelCostV2 type gains an optional contextSize field (int).

Updated references:

Task registration name in server.py cron schedule
Logger metric names in warning messages
All test imports, method names, and assertions

Updates config schema to be passed to relay to this structure now in this config field: ai-model-info:v3

   {
     "version": 3,
     "models": {
       "gpt-4": {
         "inputPerToken": 0.0000003,
         "outputPerToken": 0.00000165,
         "outputReasoningPerToken": 0.0,
         "inputCachedPerToken": 0.0000015,
         "inputCacheWritePerToken": 0.00001875,
         "contextSize": 1000000
       },
       "claude-3-5-sonnet": {
         "inputPerToken": 0.000003,
         "outputPerToken": 0.000015,
         "outputReasoningPerToken": 0.0,
         "inputCachedPerToken": 0.0000015,
         "inputCacheWritePerToken": 0.00000375
       }
     }
   }

Co-Authored-By: Claude Sonnet 4 noreply@anthropic.com

…h_ai_model_info Extend the fetch_ai_model_costs task to also fetch context size (context window length) for each AI model alongside token costs. Context size is sourced from OpenRouter's context_length field and models.dev's limit.context field, following the same precedence logic as costs (OpenRouter takes priority). The task is renamed from fetch_ai_model_costs to fetch_ai_model_info since it now fetches more than just cost data. The AIModelCostV2 type gains an optional contextSize field (int). Updated references: - Task registration name in server.py cron schedule - Logger metric names in warning messages - All test imports, method names, and assertions Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

linear-code · 2026-04-10T10:01:39Z

TET-2219 Sentry: map LLM context size to relay cost calculation config

constantinius · 2026-04-10T10:07:05Z

src/sentry/relay/config/ai_model_costs.py

+class AIModelCostV2(TypedDict, total=False):
+    inputPerToken: Required[float]
+    outputPerToken: Required[float]
+    outputReasoningPerToken: Required[float]
+    inputCachedPerToken: Required[float]
+    inputCacheWritePerToken: Required[float]
+    contextSize: int


Maybe we should update the config version, or use another structure.

Maybe adding a new field in the config would be the best approach, because I can see us wanting to expand this even further for non cost related things. I am aware that it complicates this all a bit now, but it will allow us to do a quick metadata changes in future.

That new field could be called LLMModelMetadata and for each model would contain cost and context information for now, with a chance to expand it in the future.

Added type defs for a new schema.

src/sentry/conf/server.py

vgrozdanic · 2026-04-10T11:08:52Z

src/sentry/relay/config/ai_model_costs.py

+class AIModelCostV2(TypedDict, total=False):
+    inputPerToken: Required[float]
+    outputPerToken: Required[float]
+    outputReasoningPerToken: Required[float]
+    inputCachedPerToken: Required[float]
+    inputCacheWritePerToken: Required[float]
+    contextSize: int


Maybe adding a new field in the config would be the best approach, because I can see us wanting to expand this even further for non cost related things. I am aware that it complicates this all a bit now, but it will allow us to do a quick metadata changes in future.

That new field could be called LLMModelMetadata and for each model would contain cost and context information for now, with a chance to expand it in the future.

…acy task Introduce a new LLMModelMetadata schema with costs nested under a 'costs' field and an optional contextSize. Context size is fetched from OpenRouter's context_length and models.dev's limit.context fields. Both tasks run independently on the same cron schedule: - fetch_ai_model_costs -> writes ai-model-costs:v2 (flat AIModelCostV2) - fetch_llm_model_metadata -> writes llm-model-metadata:v1 (nested LLMModelMetadata) They share raw fetch helpers (_fetch_openrouter_raw, _fetch_models_dev_raw) but format and cache independently. The old task + cache key will be removed once all consumers have migrated. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

sentry · 2026-04-10T13:30:54Z

src/sentry/relay/config/ai_model_costs.py

+        return None
+
+    cached_metadata = cache.get(LLM_MODEL_METADATA_CACHE_KEY)
+    if cached_metadata is not None:
+        return cached_metadata
+
+    if not settings.IS_DEV:
+        # in dev environment, we don't want to log this
+        logger.warning("Empty LLM model metadata")
+
+    return None


Bug: The new llm_model_metadata_config function is defined but never called, so the fetched LLM model metadata is never added to the global config sent to Relay.
_{Severity: MEDIUM}

Suggested Fix

Update get_global_config in src/sentry/relay/globalconfig.py to call llm_model_metadata_config. Add a new field to the GlobalConfig TypedDict to hold the LLM metadata, and then populate this field with the result from llm_model_metadata_config.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/relay/config/ai_model_costs.py#L78-L98 Potential issue: The `fetch_llm_model_metadata` task correctly fetches and caches LLM model metadata, including context size. However, this data is never consumed by Relay. The function `llm_model_metadata_config`, which reads from this cache, is defined but never called. The central `get_global_config` function in `src/sentry/relay/globalconfig.py` was not updated to invoke `llm_model_metadata_config` and integrate its output. Consequently, the `GlobalConfig` sent to Relay lacks the new model metadata, rendering the context-size fetching feature non-functional.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 7c46e0c. Configure here.}

cursor · 2026-04-10T13:36:01Z

src/sentry/tasks/ai_agent_monitoring.py

+            if model_id not in models_dict:
+                cost = _models_dev_entry_to_cost(model_id, model_data)
+                if cost is not None:
+                    models_dict[model_id] = cost


Refactored models.dev duplicate handling changes precedence order

Low Severity

The refactoring of the models.dev data pipeline subtly changes duplicate-handling semantics for the legacy fetch_ai_model_costs task. Previously, _fetch_models_dev_models built an internal dict where later providers' entries overwrote earlier ones (last-wins for same model_id). The returned dict was then merged into the main dict. Now, _fetch_models_dev_raw returns a flat list preserving all entries, and the caller's if model_id not in models_dict guard means the first provider's entry wins instead of the last. This changes which pricing data is used when the same model ID appears under multiple models.dev providers.

Additional Locations (1)

src/sentry/tasks/ai_agent_monitoring.py#L364-L370

^{Reviewed by Cursor Bugbot for commit 7c46e0c. Configure here.}

…acy task Introduce a new LLMModelMetadata schema with costs nested under a 'costs' field and an optional contextSize. Context size is fetched from OpenRouter's context_length and models.dev's limit.context fields. Both tasks run independently on the same cron schedule: - fetch_ai_model_costs -> writes ai-model-costs:v2 (flat AIModelCostV2) - fetch_llm_model_metadata -> writes llm-model-metadata:v1 (nested LLMModelMetadata) They share raw fetch helpers (_fetch_openrouter_raw, _fetch_models_dev_raw) but format and cache independently. The old task + cache key will be removed once all consumers have migrated. GlobalConfig now serves both fields side by side: - aiModelCosts: legacy flat format (TODO remove) - llmModelMetadata: new nested format with contextSize Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

github-actions · 2026-04-10T14:02:03Z

Backend Test Failures

Failures on 80438bc in this run:

tests/sentry/api/endpoints/test_relay_globalconfig_v3.py::test_global_config — log

[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/api/endpoints/test_relay_globalconfig_v3.py:72: in test_global_config
    assert normalized == config
E   AssertionError: assert {'aiModelCost....0, ...}, ...} == {'aiModelCost.....]}]}}}, ...}
E     
E     Omitting 5 identical items, use -vv to show
E     Right contains 1 more item:
E     {'llmModelMetadata': None}
E     
E     Full diff:
E       {
E           'aiModelCosts': None,
E     -     'llmModelMetadata': None,
E           'measurements': {
E               'builtinMeasurements': [
E                   {
E                       'name': 'app_start_cold',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'app_start_warm',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'cls',
E                       'unit': 'none',
E                   },
E                   {
E                       'name': 'connection.rtt',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fcp',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fid',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fp',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'frames_frozen_rate',
E                       'unit': 'ratio',
E                   },
E                   {
E                       'name': 'frames_frozen',
E                       'unit': 'none',
... (1493 more lines)

tests/sentry/api/endpoints/test_relay_globalconfig_v3.py::test_global_config_valid_with_generic_filters — log

[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/api/endpoints/test_relay_globalconfig_v3.py:127: in test_global_config_valid_with_generic_filters
    assert config == normalize_global_config(config)
E   AssertionError: assert {'aiModelCost...ts': 10}, ...} == {'aiModelCost.....]}]}}}, ...}
E     
E     Omitting 5 identical items, use -vv to show
E     Left contains 1 more item:
E     {'llmModelMetadata': None}
E     
E     Full diff:
E       {
E           'aiModelCosts': None,
E           'filters': {
E               'filters': [
E                   {
E                       'condition': {
E                           'inner': {
E                               'name': 'event.contexts.browser.name',
E                               'op': 'eq',
E                               'value': 'Firefox',
E                           },
E                           'op': 'not',
E                       },
E                       'id': 'test-id',
E                       'isEnabled': True,
E                   },
E               ],
E               'version': 1,
E           },
E     +     'llmModelMetadata': None,
E           'measurements': {
E               'builtinMeasurements': [
E                   {
E                       'name': 'app_start_cold',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'app_start_warm',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'cls',
E                       'unit': 'none',
E                   },
E                   {
E                       'name': 'connection.rtt',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fcp',
... (1483 more lines)

Relay's normalize_global_config strips unknown fields, causing test_relay_globalconfig_v3 failures. The new cache is still populated and readable via llm_model_metadata_config() but should not be added to Relay's GlobalConfig until Relay supports the field. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

sentry · 2026-04-10T14:31:43Z

src/sentry/tasks/ai_agent_monitoring.py

+                models_dict[model_id] = metadata
+    except Exception as e:


Bug: The code uses a strict isinstance(..., int) check for context size, which will silently drop the value if an API returns it as a float (e.g., 1000000.0).
_{Severity: MEDIUM}

Suggested Fix

Modify the type check to be more robust. Instead of a strict isinstance(..., int) check, first verify if the value is an instance of (int, float). If it is, convert it to an integer before assigning it to the metadata. This will correctly handle both integer and float representations of the context size.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/sentry/tasks/ai_agent_monitoring.py#L358-L359 Potential issue: In `_openrouter_entry_to_metadata` and `_models_dev_entry_to_metadata`, the context size from external APIs is validated using a strict `isinstance(..., int)` check. If an API returns the context size as a float (e.g., `1000000.0` instead of `1000000`), this check will fail. The code then silently ignores the context size, failing to add it to the model's metadata. This results in a silent loss of data, as there is no logging to indicate that a valid, albeit float-formatted, context size was discarded. This could lead to models being configured in Relay without their context size, even when the data is available from the source API.

constantinius requested review from a team as code owners April 10, 2026 10:01

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 10, 2026

vercel bot deployed to Preview April 10, 2026 10:03 View deployment

constantinius requested a review from vgrozdanic April 10, 2026 10:03

constantinius commented Apr 10, 2026

View reviewed changes

vgrozdanic reviewed Apr 10, 2026

View reviewed changes

constantinius requested a review from vgrozdanic April 10, 2026 13:26

vercel bot deployed to Preview April 10, 2026 13:28 View deployment

sentry bot reviewed Apr 10, 2026

View reviewed changes

cursor bot reviewed Apr 10, 2026

View reviewed changes

constantinius requested a review from a team as a code owner April 10, 2026 13:47

vercel bot deployed to Preview April 10, 2026 13:49 View deployment

vercel bot deployed to Preview April 10, 2026 14:29 View deployment

sentry bot reviewed Apr 10, 2026

View reviewed changes

constantinius mentioned this pull request Apr 10, 2026

feat(ai): Add ModelMetadata config with context size and utilization getsentry/relay#5814

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ai-monitoring): Fetch model context size and rename task to fetch_ai_model_info#112656

feat(ai-monitoring): Fetch model context size and rename task to fetch_ai_model_info#112656
constantinius wants to merge 4 commits intomasterfrom
constantinius/feat/tasks/ai-agent-monitoring-fetch-llm-context-size

constantinius commented Apr 10, 2026 •

edited

Loading

Uh oh!

linear-code bot commented Apr 10, 2026

Uh oh!

constantinius Apr 10, 2026

Uh oh!

vgrozdanic Apr 10, 2026

Uh oh!

constantinius Apr 10, 2026

Uh oh!

Uh oh!

vgrozdanic Apr 10, 2026

Uh oh!

sentry bot Apr 10, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

sentry bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

constantinius commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear-code bot commented Apr 10, 2026

Uh oh!

constantinius Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

vgrozdanic Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

constantinius Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vgrozdanic Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 10, 2026

Choose a reason for hiding this comment

Refactored models.dev duplicate handling changes precedence order

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backend Test Failures

Uh oh!

sentry bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

constantinius commented Apr 10, 2026 •

edited

Loading

github-actions bot commented Apr 10, 2026 •

edited

Loading