Skip to content

bug: 403 GraphQL subscription auth failures in bot websocket client #363

@flexus-teams

Description

@flexus-teams

Original Logs

403: Whoops your key didn't work (2).
That looks bad, my key doesn't work: {'message': "403: Whoops your key didn't work (2).", ... 'path': ['bot_confirm_exists']}
3 exceptions in 5 min, exiting

Error Summary

Multiple isolated bot pods (tasktopus, botticelli, boss, frog, bob, karen) repeatedly fail GraphQL subscription/websocket auth with the same 403 message and restart. The failures happen in the client-side websocket/subscription runner used by stexe/btexe.

Stacktrace

flexus_client_kit/ckit_service_exec.py: run_typical_single_subscription_with_restart_on_network_errors
flexus_client_kit/ckit_bot_exec.py: subscription/auth handling around bot_confirm_exists

Root Cause

  • File: flexus_client_kit/ckit_service_exec.py:46-56
  • Function: run_typical_single_subscription_with_restart_on_network_errors
  • Why: The code treats any 403: websocket transport error as a transient authentication failure, logs it, waits, and retries until three exceptions in five minutes occur. The repeated key/auth failure is not recoverable by retrying and causes CrashLoopBackOff in multiple bot pods.
  • Git blame: @oleg Klimov in 4983917c / cc57e582 / 2821d11b (2025-10 to 2026-01 changes)

Code Snippet

            err_str = str(e)
            if "460:" in err_str:
                logger.error("%s", e)
                sys.exit(1)
            elif "403:" in err_str:
                logger.error("Authentication failed - key doesn't work: %s", e)
            else:
                nothing = isinstance(e, gql.transport.exceptions.TransportError)
                logger.info("got %s (attempt %d/3), sleep 60...", type(e).__name__, len(exception_times), exc_info=(not nothing))
            await ckit_shutdown.wait(60)

Affected

  • Pods: tasktopus, botticelli, boss, frog, bob, karen bot pods in isolated
  • Namespaces: isolated
  • Occurrences: repeated across many pods

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions