Original Logs
403: Whoops your key didn't work (2).
That looks bad, my key doesn't work: {'message': "403: Whoops your key didn't work (2).", ... 'path': ['bot_confirm_exists']}
3 exceptions in 5 min, exiting
Error Summary
Multiple isolated bot pods (tasktopus, botticelli, boss, frog, bob, karen) repeatedly fail GraphQL subscription/websocket auth with the same 403 message and restart. The failures happen in the client-side websocket/subscription runner used by stexe/btexe.
Stacktrace
flexus_client_kit/ckit_service_exec.py: run_typical_single_subscription_with_restart_on_network_errors
flexus_client_kit/ckit_bot_exec.py: subscription/auth handling around bot_confirm_exists
Root Cause
- File:
flexus_client_kit/ckit_service_exec.py:46-56
- Function:
run_typical_single_subscription_with_restart_on_network_errors
- Why: The code treats any
403: websocket transport error as a transient authentication failure, logs it, waits, and retries until three exceptions in five minutes occur. The repeated key/auth failure is not recoverable by retrying and causes CrashLoopBackOff in multiple bot pods.
- Git blame: @oleg Klimov in
4983917c / cc57e582 / 2821d11b (2025-10 to 2026-01 changes)
Code Snippet
err_str = str(e)
if "460:" in err_str:
logger.error("%s", e)
sys.exit(1)
elif "403:" in err_str:
logger.error("Authentication failed - key doesn't work: %s", e)
else:
nothing = isinstance(e, gql.transport.exceptions.TransportError)
logger.info("got %s (attempt %d/3), sleep 60...", type(e).__name__, len(exception_times), exc_info=(not nothing))
await ckit_shutdown.wait(60)
Affected
- Pods: tasktopus, botticelli, boss, frog, bob, karen bot pods in
isolated
- Namespaces:
isolated
- Occurrences: repeated across many pods
Original Logs
Error Summary
Multiple isolated bot pods (tasktopus, botticelli, boss, frog, bob, karen) repeatedly fail GraphQL subscription/websocket auth with the same 403 message and restart. The failures happen in the client-side websocket/subscription runner used by stexe/btexe.
Stacktrace
Root Cause
flexus_client_kit/ckit_service_exec.py:46-56run_typical_single_subscription_with_restart_on_network_errors403:websocket transport error as a transient authentication failure, logs it, waits, and retries until three exceptions in five minutes occur. The repeated key/auth failure is not recoverable by retrying and causes CrashLoopBackOff in multiple bot pods.4983917c/cc57e582/2821d11b(2025-10 to 2026-01 changes)Code Snippet
Affected
isolatedisolated