Skip to content

fix(run): inject API token for LB cross-endpoint calls#347

Merged
deanq merged 1 commit into
mainfrom
deanquinanola/sls-336-flash-run-cross-endpoint-pipeline-calls-fail-401-api-token
Jun 29, 2026
Merged

fix(run): inject API token for LB cross-endpoint calls#347
deanq merged 1 commit into
mainfrom
deanquinanola/sls-336-flash-run-cross-endpoint-pipeline-calls-fail-401-api-token

Conversation

@deanq

@deanq deanq commented Jun 29, 2026

Copy link
Copy Markdown
Member

Summary

Under flash run, an LB endpoint that calls another endpoint (the pipeline/orchestration pattern) failed with 401 "no token provided" on the inter-endpoint call. flash deploy worked, so this was a live↔deploy asymmetry.

Root cause

flash run dispatches an LB route to a remote worker via lb_executeResourceManager.get_or_deploy_resource_do_deploy_inject_runtime_template_vars. That method only injected RUNPOD_API_KEY in its QB branch; the LB branch injected only FLASH_MODULE_PATH. So the LB worker running the route had no token and its cross-endpoint call returned 401.

flash deploy works because it builds resources through create_resource_from_manifest (runtime/resource_provisioner.py), which injects RUNPOD_API_KEY for any resource with makes_remote_calls=True. lb_execute bypasses that path entirely.

Fix

Hoist the RUNPOD_API_KEY injection out of the QB-only branch in _inject_runtime_template_vars so it runs for any endpoint where _check_makes_remote_calls() is true, regardless of type. LB endpoints still also get FLASH_MODULE_PATH. The injection is idempotent — the existing "RUNPOD_API_KEY" not in env_dict guard skips it when the manifest path already populated env, so flash deploy behavior is unchanged.

Test plan

  • Added test_do_deploy_lb_injects_api_key_when_makes_remote_calls (TDD: verified RED before fix, GREEN after).
  • make quality-check passes: ruff format + lint clean, full suite green, coverage 85.99% (≥65%).
  • Not yet exercised live; a true end-to-end check is flash run in 01_getting_started/03_mixed_workers + POST /pipeline/classify returning COMPLETED.

flash run dispatches LB routes to a remote worker via lb_execute, which
bypasses create_resource_from_manifest, the path that injects
RUNPOD_API_KEY in flash deploy. _inject_runtime_template_vars only
injected the token for QB endpoints, so an LB endpoint making a
cross-endpoint call (e.g. the pipeline example) ran without a token and
failed with HTTP 401 "no token provided".

Inject RUNPOD_API_KEY for any endpoint with makes_remote_calls=True
regardless of type, keeping flash run and flash deploy symmetric. The
injection is idempotent: the existing-key guard skips it when the
manifest path already populated env.

Fixes SLS-336.
@promptless

promptless Bot commented Jun 29, 2026

Copy link
Copy Markdown

Promptless prepared a documentation update related to this change.

Triggered by flash PR #347 (fix: inject API token for LB cross-endpoint calls, SLS-336)

Since this fix makes cross-endpoint (pipeline) calls work under flash dev/flash run — matching flash deploy — the suggestion adds a short note to the cross-endpoint communication section confirming the pipeline pattern can be tested locally before deploying. No stale content needed correcting.

Review: Note that cross-endpoint calls work locally under flash dev/run

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a flash run vs flash deploy behavior gap where load-balanced (LB) endpoints that make cross-endpoint calls were missing RUNPOD_API_KEY, causing inter-endpoint requests to fail with 401 "no token provided".

Changes:

  • Updated ServerlessResource._inject_runtime_template_vars() to inject RUNPOD_API_KEY for any endpoint type when _check_makes_remote_calls() is true, not just QB endpoints.
  • Added a unit regression test ensuring LB deployments inject RUNPOD_API_KEY when remote calls are enabled.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/runpod_flash/core/resources/serverless.py Hoists API key injection out of the QB-only branch so LB endpoints provisioned via flash run also receive RUNPOD_API_KEY when needed.
tests/unit/resources/test_serverless.py Adds regression coverage for LB deploy path injecting RUNPOD_API_KEY when makes_remote_calls=True.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@deanq

deanq commented Jun 29, 2026

Copy link
Copy Markdown
Member Author

Live smoke test — passed

Ran flash run (with this branch's flash) in 01_getting_started/03_mixed_workers and POST /pipeline/classify:

  • HTTP 200, full pipeline completed end-to-end: preprocessing → GPU inference (RTX 4090) → postprocessing (53.9s incl. cold starts).
  • Dev-server log confirms the fix firing on the LB endpoint, which previously did not happen:
    live-01_03_classify_pipeline: Injected RUNPOD_API_KEY for remote calls (makes_remote_calls=True)
    
  • Cross-endpoint calls authenticated — no 401 "no token provided". execution complete.

All four provisioned live-* endpoints were undeployed afterward.

@deanq deanq changed the title fix(run): inject API token for LB cross-endpoint calls (SLS-336) fix(run): inject API token for LB cross-endpoint calls Jun 29, 2026
@deanq deanq merged commit 8bf0e7e into main Jun 29, 2026
15 of 17 checks passed
@deanq deanq deleted the deanquinanola/sls-336-flash-run-cross-endpoint-pipeline-calls-fail-401-api-token branch June 29, 2026 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants