fix: retry transient HTTP 400 errors from upstream providers by BlueBoobyAI · Pull Request #1 · BlueBoobyAI/free-claude-code

BlueBoobyAI · 2026-06-14T17:02:07Z

Problem

Claude Code sessions crash multiple times per day when proxied through free-claude-code to DeepSeek V4 Flash:

Invalid request sent to provider. Request ID: req_d10fcb8d9fb7

The user sees a fatal error in their terminal:

Provider request failed unexpectedly.
Request ID: req_7317b1104a14

Root cause: DeepSeek occasionally returns HTTP 400 on transient internal hiccups (not real request bugs). The proxy's retryable_upstream_status() function at providers/rate_limit.py:28 only treats HTTP 429 and 5xx as retryable — 400 passes through immediately, raising InvalidRequestError → session crash → user restarts Claude Code.

The /messages POST is idempotent and 400s aren't billed, so retrying is safe.

Fix

providers/rate_limit.py — add if status == 400: return 400 to both branches of retryable_upstream_status():

# Before: 400 falls through to return None → immediate crash
if isinstance(exc, httpx.HTTPStatusError):
    status = exc.response.status_code
    if _upstream_http_retryable(status):  # only 429 + 5xx
        return status
    return None  # ← 400 dies here

# After: 400 enters the existing exponential-backoff retry loop
if isinstance(exc, httpx.HTTPStatusError):
    status = exc.response.status_code
    if _upstream_http_retryable(status):
        return status
    if status == 400:
        return 400  # ← transient 400 → retry
    return None

Important safety detail: HTTP 400 retries do NOT call set_blocked() on the shared GlobalRateLimiter. Unlike 429/5xx (which signal upstream congestion worth a global pause), a transient 400 is a per-request hiccup. A genuine bad request (wrong model name) retries with individual backoff but does not stall concurrent requests.

Additional fixes per review:

retryable_upstream_status docstring updated to document 400 behavior
Log label changed from "Upstream server error (400)" to "Transient bad request (400)" — 400 is a client error, not server error
_upstream_http_retryable docstring notes 400 is intentionally excluded (separate branch to skip set_blocked)

Safety Evidence

All 1440 existing tests pass. New tests added:

Unit tests (`test_provider_rate_limit.py`)

test_execute_with_retry_400_retried_then_exhausts — httpx HTTPStatusError with 400: asserts 3 calls (1 initial + 2 retries)
test_execute_with_retry_400_then_200_recovers — transient 400 then 200: asserts call_count == 2 and result == "ok"
test_execute_with_retry_openai_400_retried_then_exhausts — openai.BadRequestError with 400: asserts 3 calls

Integration test (`test_anthropic_messages_429_retry.py`)

test_transient_400_is_retried_then_exhausts — real execute_with_retry, 5 send calls, SSE error envelope with "Invalid request sent to provider."

DeepSeek and other providers occasionally return HTTP 400 on transient internal failures (not a real request bug). The retry gate explicitly excluded 400, so these bypassed the retry loop and killed the session. Adding 400 to retryable_upstream_status() lets transient 400s enter the existing exponential-backoff retry loop (5 attempts, 2s base, 60s cap). Real 400s (malformed requests) simply retry to the same 400 — an extra fast request with no billing impact. Same pattern as AWS SDK's RetryMode.ADAPTIVE — classify transient service failures as retryable regardless of status code.

Three new tests in test_provider_rate_limit: - test_execute_with_retry_400_retried_then_exhausts — asserts 3 calls - test_execute_with_retry_400_then_200_recovers — asserts recovery - test_execute_with_retry_openai_400_retried_then_exhausts — asserts 3 calls via openai SDK One updated test in test_anthropic_messages_429_retry: - test_transient_400_is_retried_then_exhausts — real execute_with_retry, 5 send calls, SSE error envelope with "Invalid request sent to provider." All 1440 tests pass with these changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… requests A genuine bad request (wrong model name, malformed prompt) should not block all concurrent proxy requests during retry backoff. Only 429 and 5xx signal upstream congestion worth a global pause. Also fixes duplicate @pytest.mark.asyncio decorator on the renamed test, and bumps version to 1.2.42 per AGENTS.md requirements (version + uv.lock). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- retryable_upstream_status docstring now mentions 400 (no reactive block) - Log label for 400 changed from "Upstream server error (400)" to "Transient bad request (400)" — 400 is a client error, not server error - _upstream_http_retryable docstring notes 400 is intentionally excluded (it lives in a separate branch to skip set_blocked) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add isinstance(exc, openai.BadRequestError): return 400 before the generic openai.APIError branch (BadRequestError is a subclass, so it would pass through the generic branch only if status_code attr is present — defensive ordering) - Use 0.5s base_delay for 400 retries vs 2s for 429/5xx (a transient DeepSeek hiccup resolves in <500ms; 2s was unnecessarily slow) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

BlueBoobyAI and others added 5 commits June 14, 2026 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: retry transient HTTP 400 errors from upstream providers#1

fix: retry transient HTTP 400 errors from upstream providers#1
BlueBoobyAI wants to merge 5 commits into
mainfrom
fix/retry-400-transient

BlueBoobyAI commented Jun 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BlueBoobyAI commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Safety Evidence

Unit tests (test_provider_rate_limit.py)

Integration test (test_anthropic_messages_429_retry.py)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BlueBoobyAI commented Jun 14, 2026 •

edited

Loading

Unit tests (`test_provider_rate_limit.py`)

Integration test (`test_anthropic_messages_429_retry.py`)