Add reverse distributed topology with coordinator-owned output suffix#430
Open
lobanov wants to merge 3 commits into
Open
Add reverse distributed topology with coordinator-owned output suffix#430lobanov wants to merge 3 commits into
lobanov wants to merge 3 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This change closes #428 adding a second distributed inference topology where the coordinator can own a final contiguous suffix through output, using
--layers K:output, while workers cover the lower layers0..K-1.The goal is that the public session loop behaves the same regardless of topology:
ds4_session_sync()ds4_session_eval()ds4_session_sample()Forward topology remains unchanged:
0:KN:42behavior is preservedNew reverse topology:
K:outputK:42is explicitly unsupportedThis enables all existing frontends to run with a coordinator that prefills higher layers and runs the output head locally, without topology-specific frontend changes.
Main implementation changes
ds4_distributed.cACK_ONLYREADME.mdK:outputusage.ds4_help.cK:42is unsupported.Main function-level changes
Key functions updated or introduced in
ds4_distributed.c:dist_infer_coordinator_topology()dist_validate_layers_for_model()dist_coordinator_route_span()dist_coordinator_build_route_plan()dist_coordinator_report_plan()dist_coordinator_send_remote_work_on_fd()dist_coordinator_request_remote_on_fd()dist_coordinator_eval_local_suffix()dist_coordinator_eval_span()dist_coordinator_can_pipeline_prefill()dist_coordinator_prefill_prompt_pipelined()dist_coordinator_prefill_prompt()dist_worker_process_work_payload()dist_kv_route_build_owners()ds4_dist_session_save_payload()ds4_dist_session_load_payload()Added tests
Automated tests:
--distributed-topology-logictods4_test.0:KK:outputK:42Test support:
tests/ds4_distributed_test_internal.has a test-only bridge exposing the minimum internal distributed structs and thin internal entry points needed bytests/ds4_test.c.Validation performed
Build validation:
makemake ds4_test./ds4_test --distributed-topology-logicmake test(skipped streaming-decode-prefill-correctness and mtp-verify-depth)make cpuManual/local runtime distributed validation on localhost:
20:output0:190:1920:output--dist-prefill-chunk 8 --dist-prefill-window 2ds4-evalin distributed mode with reverse topology:--questions 4match reference62/92 passed, 30 failed, runtime 00h:55m