Skip to content

feat(ai-monitoring): Fetch model context size and rename task to fetch_ai_model_info#112656

Open
constantinius wants to merge 4 commits intomasterfrom
constantinius/feat/tasks/ai-agent-monitoring-fetch-llm-context-size
Open

feat(ai-monitoring): Fetch model context size and rename task to fetch_ai_model_info#112656
constantinius wants to merge 4 commits intomasterfrom
constantinius/feat/tasks/ai-agent-monitoring-fetch-llm-context-size

Conversation

@constantinius
Copy link
Copy Markdown
Contributor

@constantinius constantinius commented Apr 10, 2026

Closes https://linear.app/getsentry/issue/TET-2219/sentry-map-llm-context-size-to-relay-cost-calculation-config

Extend the fetch_ai_model_costs task to also fetch context size (context window length) for each AI model alongside token costs. Context size is sourced from OpenRouter's context_length field and models.dev's limit.context field, following the same precedence logic as costs (OpenRouter takes priority).

The task is renamed from fetch_ai_model_costs to fetch_ai_model_info since it now fetches more than just cost data. The AIModelCostV2 type gains an optional contextSize field (int).

Updated references:

  • Task registration name in server.py cron schedule
  • Logger metric names in warning messages
  • All test imports, method names, and assertions

Updates config schema to be passed to relay to this structure now in this config field: ai-model-info:v3

   {
     "version": 3,
     "models": {
       "gpt-4": {
         "inputPerToken": 0.0000003,
         "outputPerToken": 0.00000165,
         "outputReasoningPerToken": 0.0,
         "inputCachedPerToken": 0.0000015,
         "inputCacheWritePerToken": 0.00001875,
         "contextSize": 1000000
       },
       "claude-3-5-sonnet": {
         "inputPerToken": 0.000003,
         "outputPerToken": 0.000015,
         "outputReasoningPerToken": 0.0,
         "inputCachedPerToken": 0.0000015,
         "inputCacheWritePerToken": 0.00000375
       }
     }
   }

Co-Authored-By: Claude Sonnet 4 noreply@anthropic.com

…h_ai_model_info

Extend the fetch_ai_model_costs task to also fetch context size (context
window length) for each AI model alongside token costs. Context size is
sourced from OpenRouter's context_length field and models.dev's
limit.context field, following the same precedence logic as costs
(OpenRouter takes priority).

The task is renamed from fetch_ai_model_costs to fetch_ai_model_info
since it now fetches more than just cost data. The AIModelCostV2 type
gains an optional contextSize field (int).

Updated references:
- Task registration name in server.py cron schedule
- Logger metric names in warning messages
- All test imports, method names, and assertions

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@constantinius constantinius requested review from a team as code owners April 10, 2026 10:01
@linear-code
Copy link
Copy Markdown

linear-code bot commented Apr 10, 2026

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 10, 2026
Comment on lines +20 to +26
class AIModelCostV2(TypedDict, total=False):
inputPerToken: Required[float]
outputPerToken: Required[float]
outputReasoningPerToken: Required[float]
inputCachedPerToken: Required[float]
inputCacheWritePerToken: Required[float]
contextSize: int
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should update the config version, or use another structure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a new field in the config would be the best approach, because I can see us wanting to expand this even further for non cost related things. I am aware that it complicates this all a bit now, but it will allow us to do a quick metadata changes in future.

That new field could be called LLMModelMetadata and for each model would contain cost and context information for now, with a chance to expand it in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added type defs for a new schema.

Comment on lines +20 to +26
class AIModelCostV2(TypedDict, total=False):
inputPerToken: Required[float]
outputPerToken: Required[float]
outputReasoningPerToken: Required[float]
inputCachedPerToken: Required[float]
inputCacheWritePerToken: Required[float]
contextSize: int
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe adding a new field in the config would be the best approach, because I can see us wanting to expand this even further for non cost related things. I am aware that it complicates this all a bit now, but it will allow us to do a quick metadata changes in future.

That new field could be called LLMModelMetadata and for each model would contain cost and context information for now, with a chance to expand it in the future.

…acy task

Introduce a new LLMModelMetadata schema with costs nested under a
'costs' field and an optional contextSize. Context size is fetched from
OpenRouter's context_length and models.dev's limit.context fields.

Both tasks run independently on the same cron schedule:
- fetch_ai_model_costs -> writes ai-model-costs:v2 (flat AIModelCostV2)
- fetch_llm_model_metadata -> writes llm-model-metadata:v1 (nested LLMModelMetadata)

They share raw fetch helpers (_fetch_openrouter_raw, _fetch_models_dev_raw)
but format and cache independently. The old task + cache key will be
removed once all consumers have migrated.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Comment on lines +88 to +98
return None

cached_metadata = cache.get(LLM_MODEL_METADATA_CACHE_KEY)
if cached_metadata is not None:
return cached_metadata

if not settings.IS_DEV:
# in dev environment, we don't want to log this
logger.warning("Empty LLM model metadata")

return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The new llm_model_metadata_config function is defined but never called, so the fetched LLM model metadata is never added to the global config sent to Relay.
Severity: MEDIUM

Suggested Fix

Update get_global_config in src/sentry/relay/globalconfig.py to call llm_model_metadata_config. Add a new field to the GlobalConfig TypedDict to hold the LLM metadata, and then populate this field with the result from llm_model_metadata_config.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/relay/config/ai_model_costs.py#L78-L98

Potential issue: The `fetch_llm_model_metadata` task correctly fetches and caches LLM
model metadata, including context size. However, this data is never consumed by Relay.
The function `llm_model_metadata_config`, which reads from this cache, is defined but
never called. The central `get_global_config` function in
`src/sentry/relay/globalconfig.py` was not updated to invoke `llm_model_metadata_config`
and integrate its output. Consequently, the `GlobalConfig` sent to Relay lacks the new
model metadata, rendering the context-size fetching feature non-functional.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7c46e0c. Configure here.

if model_id not in models_dict:
cost = _models_dev_entry_to_cost(model_id, model_data)
if cost is not None:
models_dict[model_id] = cost
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored models.dev duplicate handling changes precedence order

Low Severity

The refactoring of the models.dev data pipeline subtly changes duplicate-handling semantics for the legacy fetch_ai_model_costs task. Previously, _fetch_models_dev_models built an internal dict where later providers' entries overwrote earlier ones (last-wins for same model_id). The returned dict was then merged into the main dict. Now, _fetch_models_dev_raw returns a flat list preserving all entries, and the caller's if model_id not in models_dict guard means the first provider's entry wins instead of the last. This changes which pricing data is used when the same model ID appears under multiple models.dev providers.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7c46e0c. Configure here.

…acy task

Introduce a new LLMModelMetadata schema with costs nested under a
'costs' field and an optional contextSize. Context size is fetched from
OpenRouter's context_length and models.dev's limit.context fields.

Both tasks run independently on the same cron schedule:
- fetch_ai_model_costs -> writes ai-model-costs:v2 (flat AIModelCostV2)
- fetch_llm_model_metadata -> writes llm-model-metadata:v1 (nested LLMModelMetadata)

They share raw fetch helpers (_fetch_openrouter_raw, _fetch_models_dev_raw)
but format and cache independently. The old task + cache key will be
removed once all consumers have migrated.

GlobalConfig now serves both fields side by side:
- aiModelCosts: legacy flat format (TODO remove)
- llmModelMetadata: new nested format with contextSize

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

Backend Test Failures

Failures on 80438bc in this run:

tests/sentry/api/endpoints/test_relay_globalconfig_v3.py::test_global_configlog
[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/api/endpoints/test_relay_globalconfig_v3.py:72: in test_global_config
    assert normalized == config
E   AssertionError: assert {'aiModelCost....0, ...}, ...} == {'aiModelCost.....]}]}}}, ...}
E     
E     Omitting 5 identical items, use -vv to show
E     Right contains 1 more item:
E     {'llmModelMetadata': None}
E     
E     Full diff:
E       {
E           'aiModelCosts': None,
E     -     'llmModelMetadata': None,
E           'measurements': {
E               'builtinMeasurements': [
E                   {
E                       'name': 'app_start_cold',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'app_start_warm',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'cls',
E                       'unit': 'none',
E                   },
E                   {
E                       'name': 'connection.rtt',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fcp',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fid',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fp',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'frames_frozen_rate',
E                       'unit': 'ratio',
E                   },
E                   {
E                       'name': 'frames_frozen',
E                       'unit': 'none',
... (1493 more lines)
tests/sentry/api/endpoints/test_relay_globalconfig_v3.py::test_global_config_valid_with_generic_filterslog
[gw0] linux -- Python 3.13.1 /home/runner/work/sentry/sentry/.venv/bin/python3
tests/sentry/api/endpoints/test_relay_globalconfig_v3.py:127: in test_global_config_valid_with_generic_filters
    assert config == normalize_global_config(config)
E   AssertionError: assert {'aiModelCost...ts': 10}, ...} == {'aiModelCost.....]}]}}}, ...}
E     
E     Omitting 5 identical items, use -vv to show
E     Left contains 1 more item:
E     {'llmModelMetadata': None}
E     
E     Full diff:
E       {
E           'aiModelCosts': None,
E           'filters': {
E               'filters': [
E                   {
E                       'condition': {
E                           'inner': {
E                               'name': 'event.contexts.browser.name',
E                               'op': 'eq',
E                               'value': 'Firefox',
E                           },
E                           'op': 'not',
E                       },
E                       'id': 'test-id',
E                       'isEnabled': True,
E                   },
E               ],
E               'version': 1,
E           },
E     +     'llmModelMetadata': None,
E           'measurements': {
E               'builtinMeasurements': [
E                   {
E                       'name': 'app_start_cold',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'app_start_warm',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'cls',
E                       'unit': 'none',
E                   },
E                   {
E                       'name': 'connection.rtt',
E                       'unit': 'millisecond',
E                   },
E                   {
E                       'name': 'fcp',
... (1483 more lines)

Relay's normalize_global_config strips unknown fields, causing
test_relay_globalconfig_v3 failures. The new cache is still populated
and readable via llm_model_metadata_config() but should not be added
to Relay's GlobalConfig until Relay supports the field.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Comment on lines +358 to +359
models_dict[model_id] = metadata
except Exception as e:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code uses a strict isinstance(..., int) check for context size, which will silently drop the value if an API returns it as a float (e.g., 1000000.0).
Severity: MEDIUM

Suggested Fix

Modify the type check to be more robust. Instead of a strict isinstance(..., int) check, first verify if the value is an instance of (int, float). If it is, convert it to an integer before assigning it to the metadata. This will correctly handle both integer and float representations of the context size.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/sentry/tasks/ai_agent_monitoring.py#L358-L359

Potential issue: In `_openrouter_entry_to_metadata` and `_models_dev_entry_to_metadata`,
the context size from external APIs is validated using a strict `isinstance(..., int)`
check. If an API returns the context size as a float (e.g., `1000000.0` instead of
`1000000`), this check will fail. The code then silently ignores the context size,
failing to add it to the model's metadata. This results in a silent loss of data, as
there is no logging to indicate that a valid, albeit float-formatted, context size was
discarded. This could lead to models being configured in Relay without their context
size, even when the data is available from the source API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants