Skip to content

feat(deepinfra): update model YAMLs [bot]#913

Merged
LordGameleo merged 3 commits intomainfrom
bot/update-deepinfra-20260502-021707
May 5, 2026
Merged

feat(deepinfra): update model YAMLs [bot]#913
LordGameleo merged 3 commits intomainfrom
bot/update-deepinfra-20260502-021707

Conversation

@harshiv-26
Copy link
Copy Markdown
Collaborator

@harshiv-26 harshiv-26 commented May 2, 2026

Auto-generated by poc-agent for provider deepinfra.


Note

Medium Risk
Updates model metadata that affects advertised modalities and per-token/per-image pricing, which can change request routing/validation and cost estimation. No code changes, but mis-specified YAML values could impact downstream billing and capability checks.

Overview
Updates multiple DeepInfra model YAMLs to reflect new pricing (token/image rates, including adding output_cost_per_image for FLUX-2-pro and lowering sdxl-turbo image cost).

Adjusts capability metadata by removing erroneous image input modalities from several embedding/image models, adding text input to Bria/replace_background, and adding video input support to several Qwen3.5 chat models. Also marks some models explicitly status: active and adds a deprecationDate to PaddleOCR-VL-0.9B.

Reviewed by Cursor Bugbot for commit 8743ec9. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

/test-models

@harshiv-26
Copy link
Copy Markdown
Collaborator Author

Gateway test results

  • Total: 53
  • Passed: 37
  • Failed: 3
  • Validation failed: 2
  • Errored: 0
  • Skipped: 11
  • Success rate: 88.1%
Provider Model Scenarios
deepinfra BAAI/bge-base-en-v1.5 success: params
deepinfra BAAI/bge-en-icl success: params
deepinfra Bria/fibo skipped: skip-check
deepinfra Bria/replace_background skipped: skip-check
deepinfra MiniMaxAI/MiniMax-M2.5 success: tool-call, params:stream, tool-call:stream, params, reasoning:stream, reasoning

validation_failure: json-output, json-output:stream
deepinfra PaddlePaddle/PaddleOCR-VL-0.9B success: json-output:stream, params:stream, params

failure: json-output
deepinfra Qwen/Qwen3.5-35B-A3B success: tool-call:stream, tool-call, params, params:stream, reasoning:stream, reasoning

skipped: json-output, json-output:stream
deepinfra Qwen/Qwen3.5-397B-A17B success: tool-call:stream, tool-call, params:stream, params, reasoning:stream, reasoning

skipped: json-output, json-output:stream
deepinfra Qwen/Qwen3.5-4B success: tool-call:stream, tool-call, params, params:stream, reasoning:stream, reasoning

skipped: json-output, json-output:stream
deepinfra black-forest-labs/FLUX-1.1-pro skipped: skip-check
deepinfra black-forest-labs/FLUX-2-pro skipped: skip-check
deepinfra sentence-transformers/multi-qa-mpnet-base-dot-v1 success: params
deepinfra shibing624/text2vec-base-chinese success: params
deepinfra stabilityai/sdxl-turbo skipped: skip-check
deepinfra stepfun-ai/Step-3.5-Flash success: tool-call:stream, tool-call, params:stream, params, reasoning:stream, reasoning

failure: json-output:stream, json-output
Failures (5)

deepinfra/PaddlePaddle/PaddleOCR-VL-0.9B — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpum3p5xs4/snippet.py", line 22, in <module>
    _json.loads(_content)
  File "/usr/local/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting ':' delimiter: line 376 column 1 (char 25770)
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/PaddlePaddle-PaddleOCR-VL-0.9B",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/MiniMaxAI/MiniMax-M2.5 — json-output (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpfnmruw6d/snippet.py", line 20, in <module>
    raise Exception("VALIDATION FAILED: json-output - response content is empty")
Exception: VALIDATION FAILED: json-output - response content is empty
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/MiniMaxAI-MiniMax-M2.5",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/MiniMaxAI/MiniMax-M2.5 — json-output:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpgpmjxvju/snippet.py", line 25, in <module>
    raise Exception("VALIDATION FAILED: json-output stream - no content received")
Exception: VALIDATION FAILED: json-output stream - no content received
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/MiniMaxAI-MiniMax-M2.5",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/stepfun-ai/Step-3.5-Flash — json-output:stream (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp3pa0uph4/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.APIStatusError: Error code: 405 - {'status': 'failure', 'message': 'deepinfra error: json_object response format is not supported for model: stepfun-ai/Step-3.5-Flash', 'error': {'message': 'deepinfra error: json_object response format is not supported for model: stepfun-ai/Step-3.5-Flash', 'type': 'APIError', 'code': '405'}, 'error_origin_level': 'api_error', 'provider': 'deepinfra'}
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/stepfun-ai-Step-3.5-Flash",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/stepfun-ai/Step-3.5-Flash — json-output (failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmp5hfh375c/snippet.py", line 5, in <module>
    response = client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/resources/chat/completions/completions.py", line 1147, in create
    return self._post(
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.APIStatusError: Error code: 405 - {'status': 'failure', 'message': 'deepinfra error: json_object response format is not supported for model: stepfun-ai/Step-3.5-Flash', 'error': {'message': 'deepinfra error: json_object response format is not supported for model: stepfun-ai/Step-3.5-Flash', 'type': 'APIError', 'code': '405'}, 'error_origin_level': 'api_error', 'provider': 'deepinfra'}
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/stepfun-ai-Step-3.5-Flash",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")
Skipped (11)

deepinfra/black-forest-labs/FLUX-1.1-pro — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/black-forest-labs/FLUX-2-pro — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/Bria/fibo — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/Bria/replace_background — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/Qwen/Qwen3.5-35B-A3B — json-output (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-35B-A3B — json-output:stream (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-397B-A17B — json-output (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-397B-A17B — json-output:stream (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-4B — json-output (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-4B — json-output:stream (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/stabilityai/sdxl-turbo — skip-check (skipped)

Skip reason:

unsupported mode 'image'

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

/test-models

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

/test-models

@harshiv-26
Copy link
Copy Markdown
Collaborator Author

Gateway test results

  • Total: 43
  • Passed: 30
  • Failed: 0
  • Validation failed: 2
  • Errored: 0
  • Skipped: 11
  • Success rate: 93.75%
Provider Model Scenarios
deepinfra BAAI/bge-base-en-v1.5 success: params
deepinfra BAAI/bge-en-icl success: params
deepinfra Bria/fibo skipped: skip-check
deepinfra Bria/replace_background skipped: skip-check
deepinfra MiniMaxAI/MiniMax-M2.5 success: tool-call, params, tool-call:stream, params:stream, reasoning:stream, reasoning

validation_failure: json-output, json-output:stream
deepinfra PaddlePaddle/PaddleOCR-VL-0.9B success: params:stream, params
deepinfra Qwen/Qwen3.5-35B-A3B success: params:stream, tool-call, tool-call:stream, params, reasoning:stream, reasoning

skipped: json-output, json-output:stream
deepinfra Qwen/Qwen3.5-397B-A17B success: params:stream, tool-call:stream, tool-call, params, reasoning, reasoning:stream

skipped: json-output, json-output:stream
deepinfra Qwen/Qwen3.5-4B success: tool-call:stream, params:stream, params, tool-call, reasoning:stream, reasoning

skipped: json-output, json-output:stream
deepinfra black-forest-labs/FLUX-1.1-pro skipped: skip-check
deepinfra black-forest-labs/FLUX-2-pro skipped: skip-check
deepinfra sentence-transformers/multi-qa-mpnet-base-dot-v1 success: params
deepinfra shibing624/text2vec-base-chinese success: params
deepinfra stabilityai/sdxl-turbo skipped: skip-check
Failures (2)

deepinfra/MiniMaxAI/MiniMax-M2.5 — json-output (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpeaxvavce/snippet.py", line 20, in <module>
    raise Exception("VALIDATION FAILED: json-output - response content is empty")
Exception: VALIDATION FAILED: json-output - response content is empty
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/MiniMaxAI-MiniMax-M2.5",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")

deepinfra/MiniMaxAI/MiniMax-M2.5 — json-output:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpefd0kqvq/snippet.py", line 25, in <module>
    raise Exception("VALIDATION FAILED: json-output stream - no content received")
Exception: VALIDATION FAILED: json-output stream - no content received
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/MiniMaxAI-MiniMax-M2.5",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")
Skipped (11)

deepinfra/black-forest-labs/FLUX-1.1-pro — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/black-forest-labs/FLUX-2-pro — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/Bria/fibo — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/Bria/replace_background — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/Qwen/Qwen3.5-35B-A3B — json-output (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-35B-A3B — json-output:stream (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-397B-A17B — json-output (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-397B-A17B — json-output:stream (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-4B — json-output (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-4B — json-output:stream (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/stabilityai/sdxl-turbo — skip-check (skipped)

Skip reason:

unsupported mode 'image'

@harshiv-26
Copy link
Copy Markdown
Collaborator Author

Gateway test results

  • Total: 45
  • Passed: 32
  • Failed: 0
  • Validation failed: 2
  • Errored: 0
  • Skipped: 11
  • Success rate: 94.12%
Provider Model Scenarios
deepinfra BAAI/bge-base-en-v1.5 success: params
deepinfra BAAI/bge-en-icl success: params
deepinfra Bria/fibo skipped: skip-check
deepinfra Bria/replace_background skipped: skip-check
deepinfra MiniMaxAI/MiniMax-M2.5 success: params, tool-call, tool-call:stream, params:stream, reasoning:stream, reasoning

validation_failure: json-output:stream, json-output
deepinfra PaddlePaddle/PaddleOCR-VL-0.9B success: json-output, params, params:stream, json-output:stream
deepinfra Qwen/Qwen3.5-35B-A3B success: tool-call:stream, params:stream, tool-call, params, reasoning:stream, reasoning

skipped: json-output, json-output:stream
deepinfra Qwen/Qwen3.5-397B-A17B success: tool-call, tool-call:stream, params, params:stream, reasoning, reasoning:stream

skipped: json-output, json-output:stream
deepinfra Qwen/Qwen3.5-4B success: params, tool-call, params:stream, tool-call:stream, reasoning, reasoning:stream

skipped: json-output, json-output:stream
deepinfra black-forest-labs/FLUX-1.1-pro skipped: skip-check
deepinfra black-forest-labs/FLUX-2-pro skipped: skip-check
deepinfra sentence-transformers/multi-qa-mpnet-base-dot-v1 success: params
deepinfra shibing624/text2vec-base-chinese success: params
deepinfra stabilityai/sdxl-turbo skipped: skip-check
Failures (2)

deepinfra/MiniMaxAI/MiniMax-M2.5 — json-output:stream (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmphg8cwlsv/snippet.py", line 25, in <module>
    raise Exception("VALIDATION FAILED: json-output stream - no content received")
Exception: VALIDATION FAILED: json-output stream - no content received
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/MiniMaxAI-MiniMax-M2.5",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=True,
)

import json as _json

_accumulated = ""
for chunk in response:
    if chunk.choices and len(chunk.choices) > 0:
        delta = chunk.choices[0].delta
        if delta.content is not None:
            _accumulated += delta.content
            print(delta.content, end="", flush=True)

if not _accumulated:
    raise Exception("VALIDATION FAILED: json-output stream - no content received")

_json.loads(_accumulated)
print("\nVALIDATION: json-output stream SUCCESS")

deepinfra/MiniMaxAI/MiniMax-M2.5 — json-output (validation_failure)

Error:

Traceback (most recent call last):
  File "/tmp/tmpdxq8mnbr/snippet.py", line 20, in <module>
    raise Exception("VALIDATION FAILED: json-output - response content is empty")
Exception: VALIDATION FAILED: json-output - response content is empty
Code snippet
from openai import OpenAI

client = OpenAI(api_key="***", base_url="https://internal.devtest.truefoundry.tech/api/llm")

response = client.chat.completions.create(
    model="test-v2-deepinfra/MiniMaxAI-MiniMax-M2.5",
    messages=[
        {"role": "user", "content": "List 3 colors with their hex codes in JSON."},
    ],
    response_format={"type": "json_object"},
    stream=False,
)

import json as _json

_content = response.choices[0].message.content
print(_content)

if not _content:
    raise Exception("VALIDATION FAILED: json-output - response content is empty")

_json.loads(_content)
print("VALIDATION: json-output SUCCESS")
Skipped (11)

deepinfra/black-forest-labs/FLUX-1.1-pro — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/black-forest-labs/FLUX-2-pro — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/Bria/fibo — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/Bria/replace_background — skip-check (skipped)

Skip reason:

unsupported mode 'image'

deepinfra/Qwen/Qwen3.5-35B-A3B — json-output (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-35B-A3B — json-output:stream (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-397B-A17B — json-output (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-397B-A17B — json-output:stream (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-4B — json-output (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/Qwen/Qwen3.5-4B — json-output:stream (skipped)

Skip reason:

Model emits thinking block in response

deepinfra/stabilityai/sdxl-turbo — skip-check (skipped)

Skip reason:

unsupported mode 'image'

@LordGameleo LordGameleo merged commit 63b1fce into main May 5, 2026
8 checks passed
@LordGameleo LordGameleo deleted the bot/update-deepinfra-20260502-021707 branch May 5, 2026 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants