Skip to content

Add reverse distributed topology with coordinator-owned output suffix#430

Open
lobanov wants to merge 3 commits into
antirez:mainfrom
lobanov:reverse-topology-pr
Open

Add reverse distributed topology with coordinator-owned output suffix#430
lobanov wants to merge 3 commits into
antirez:mainfrom
lobanov:reverse-topology-pr

Conversation

@lobanov

@lobanov lobanov commented Jun 16, 2026

Copy link
Copy Markdown

Summary

This change closes #428 adding a second distributed inference topology where the coordinator can own a final contiguous suffix through output, using --layers K:output, while workers cover the lower layers 0..K-1.

The goal is that the public session loop behaves the same regardless of topology:

  • ds4_session_sync()
  • ds4_session_eval()
  • ds4_session_sample()

Forward topology remains unchanged:

  • coordinator owns 0:K
  • workers cover higher layers
  • existing forward N:42 behavior is preserved

New reverse topology:

  • coordinator owns K:output
  • workers cover the lower prefix
  • reverse K:42 is explicitly unsupported

This enables all existing frontends to run with a coordinator that prefills higher layers and runs the output head locally, without topology-specific frontend changes.

Main implementation changes

ds4_distributed.c

  • Added internal topology inference and validation for coordinator-owned prefix vs suffix.
  • Generalized route planning so the coordinator can plan either:
    • forward routes covering layers after its local slice
    • reverse routes covering layers before its local slice
  • Updated route reporting and completeness checks to be topology-aware.
  • Made decode execution topology-aware:
    • forward remains local-first then remote
    • reverse is remote-first then local suffix/output
  • Extended serial prefill to work in reverse topology through the same session-facing flow.
  • Implemented reverse pipelined prefill:
    • workers process each chunk first
    • each chunk returns hidden state upstream
    • coordinator applies its local suffix on every returned chunk
    • reverse mode does not use forward-style non-final ACK_ONLY
  • Hardened worker-side route validation so remote-first routes must return hidden state upstream rather than remote logits.
  • Generalized distributed KV shard ownership/save/load logic from “local shard first” to ordered ownership by layer range, so reverse topology persists correctly.
  • Fixed reverse pipelined prefill reset symmetry so checkpoint/suffix sync cannot reset the local suffix when the remote prefix was intentionally not reset.

README.md

  • Documented reverse distributed topology and coordinator K:output usage.
  • Clarified protocol behavior for forward vs reverse execution and pipelined prefill.

ds4_help.c

  • Updated distributed help text to describe both supported coordinator layouts and note that reverse K:42 is unsupported.

Main function-level changes

Key functions updated or introduced in ds4_distributed.c:

  • dist_infer_coordinator_topology()
  • dist_validate_layers_for_model()
  • dist_coordinator_route_span()
  • dist_coordinator_build_route_plan()
  • dist_coordinator_report_plan()
  • dist_coordinator_send_remote_work_on_fd()
  • dist_coordinator_request_remote_on_fd()
  • dist_coordinator_eval_local_suffix()
  • dist_coordinator_eval_span()
  • dist_coordinator_can_pipeline_prefill()
  • dist_coordinator_prefill_prompt_pipelined()
  • dist_coordinator_prefill_prompt()
  • dist_worker_process_work_payload()
  • dist_kv_route_build_owners()
  • ds4_dist_session_save_payload()
  • ds4_dist_session_load_payload()

Added tests

Automated tests:

  • Added --distributed-topology-logic to ds4_test.
  • This covers:
    • coordinator layer validation for forward 0:K
    • coordinator layer validation for reverse K:output
    • rejection of reverse K:42
    • rejection of middle-only coordinator slices
    • forward route planning over synthetic worker layouts
    • reverse route planning over synthetic worker layouts
    • assertion that reverse routes do not assign remote output logits
    • KV owner ordering for reverse topology
    • KV owner gap rejection

Test support:

  • Added tests/ds4_distributed_test_internal.h as a test-only bridge exposing the minimum internal distributed structs and thin internal entry points needed by tests/ds4_test.c.
  • Kept test setup logic in the test file rather than adding heavy public test helper APIs.

Validation performed

Build validation:

  • make
  • make ds4_test
  • ./ds4_test --distributed-topology-logic
  • make test (skipped streaming-decode-prefill-correctness and mtp-verify-depth)
  • make cpu

Manual/local runtime distributed validation on localhost:

  • Forward topology:
    • worker 20:output
    • coordinator 0:19
    • verified route formation, prompt prefill, and one-token generation
  • Reverse topology:
    • worker 0:19
    • coordinator 20:output
    • verified route formation, prompt prefill, and one-token generation
  • Pipelined prefill:
    • ran long-prompt tests with --dist-prefill-chunk 8 --dist-prefill-window 2
    • confirmed pipelined distributed prefill in both forward and reverse mode
    • confirmed successful generation after prefill in both modes
  • Running ds4-eval in distributed mode with reverse topology:
    • Responses to --questions 4 match reference
    • Full suite produces result: 62/92 passed, 30 failed, runtime 00h:55m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow coordinator to own N:output layers in distributed mode

1 participant