feat: add /swarm parallel agent-swarm orchestration#208
Conversation
🦋 Changeset detectedLatest commit: 1461143 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc5e4bf787
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (typeof parsed !== 'object' || parsed === null) return null; | ||
|
|
||
| const subtasksRaw = (parsed as { subtasks?: unknown }).subtasks; | ||
| if (!Array.isArray(subtasksRaw) || subtasksRaw.length === 0) return null; |
There was a problem hiding this comment.
Enforce the planner's subtask cap
When the planner returns valid JSON with more than the prompted maximum of 6 subtasks, this accepts the entire array; SwarmCoordinator.runWave then iterates every entry and spawns a subagent for each one, only limiting concurrent workers to 4. In the common failure mode where the LLM ignores the cap or emits a large accidental list, /swarm can launch dozens of subagents and burn substantial time/tokens instead of retrying or rejecting the invalid plan. Please validate subtasksRaw.length <= 6 here (or truncate deliberately) before spawning workers.
Useful? React with 👍 / 👎.
| return; | ||
| } | ||
| try { | ||
| await session.prompt(buildSwarmPrompt(task)); |
There was a problem hiding this comment.
Handle sessions whose active tools lack Swarm
This directly prompts the current session to call Swarm, but resumed sessions created before this commit replay their old tools.set_active_tools record from the wire, so their active tool list does not include the newly added Swarm entry from agent.yaml. In that context /swarm <task> is accepted by the TUI but the model is asked to use a tool that is not exposed, so the command fails or devolves into normal chat; migrate old agent tool lists or check tool availability before sending this framed prompt.
Useful? React with 👍 / 👎.
…ride test and changeset
…clean up on reset
The stall-detection repeat key joined the tool name and canonical args with a literal NUL (0x00) separator. The control byte caused git to classify stall-hook.ts as binary, so diffs, blame, and code review on the file were opaque — which prevented confirming the test history for this feature. Replace the NUL with a normal space (tool names are identifiers and never contain spaces, so keys stay collision-free) so the file is plain UTF-8 text and remains reviewable. Behavior is unchanged: the key still uniquely combines tool name and canonical args. Verified by reverting the hook to a no-op stub to show the three stall-detection test files go red (the discriminating block, canonical-key, e2e turn-abort, and worker-stall-translation cases all fail), then restoring the real implementation to confirm they pass — the failing-first the prior atomic commit never recorded. Full suite: 5049 passed / 25 skipped; make typecheck clean.
…nerate/reassign/drop)
…on-aware recovery UI
The summary-continuation pass re-prompted any subagent whose first
summary was under 200 chars to "expand" it, then read back the
follow-up turn — replacing the original output rather than appending.
For swarm's structured-output subagents this was harmful: a reviser's
compact decision JSON (e.g. {"kind":"retry"}) is always under the
threshold, so the expand turn always fired and could replace the JSON
with prose, silently degrading the recovery loop into conservative
drops. It also taxed every short-but-complete handoff with an extra
turn.
Remove the heuristic entirely so a subagent's first summary is returned
as-is. The max-tokens truncation guard is unaffected.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cc9176b3d0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const subtasksRaw = (parsed as { subtasks?: unknown }).subtasks; | ||
| if (!Array.isArray(subtasksRaw) || subtasksRaw.length === 0) return null; |
There was a problem hiding this comment.
Enforce the advertised subtask cap
The planner prompt says to keep the plan to at most 6 subtasks, but this parser accepts any non-empty subtasks array. If the model returns dozens or hundreds of items, runWithRetries will execute every one of them (bounded only by concurrency), which can turn a single /swarm call into unexpected token/tool spend and a very long run instead of retrying/rejecting the invalid plan.
Useful? React with 👍 / 👎.
| try { | ||
| await session.prompt(buildSwarmPrompt(task)); |
There was a problem hiding this comment.
Route /swarm through the normal send lifecycle
This calls session.prompt directly, so the TUI never runs the normal sendMessageInternal setup (beginSessionRequest, streaming state, transcript entry, and queue handling). During the initial model latency before any SDK event arrives, the app still considers itself idle, so another user input or idle-only slash command can be accepted and race with the swarm turn instead of being blocked/queued like a normal prompt.
Useful? React with 👍 / 👎.
| await runChildTurnToCompletion(child, options.signal); | ||
| result = lastAssistantText(child); | ||
| } | ||
| const result = lastAssistantText(child); |
There was a problem hiding this comment.
Preserve continuation for regular subagents
This now returns the first assistant message for every subagent, removing the previous follow-up that expanded summaries shorter than 200 characters. That may be useful for swarm planner/reviser JSON, but it also changes normal Agent/explore subagents: a terse answer such as “Done” is handed back to the parent without the bounded expansion turn, leaving the parent under-informed. Scope the raw-result behavior to the swarm/profileOverride path rather than all subagents.
Useful? React with 👍 / 👎.
| if (result.is_error === true) { | ||
| this.swarmModel = applySwarmEvent(this.swarmModel, { t: 'cancelled' }); |
There was a problem hiding this comment.
Preserve non-cancel swarm errors
This treats every Swarm error result as cancelled, but SwarmTool returns isError for ordinary failures too, such as planner JSON failures or synthesizer errors. Because swarm cards also skip rendering the tool result body, those failures are displayed as a cancelled dashboard with the actual Swarm failed: ... message hidden from the user; only genuine abort/cancel errors should take this path.
Useful? React with 👍 / 👎.
| // visible but mark it retrying (an in-flight, uncounted state) so the | ||
| // re-spawn can collapse onto it. Carries no subagent id, so we match by | ||
| // role against the most recent terminal/retrying row. | ||
| const prior = findReusableRoleRow(model.workers, event.role); |
There was a problem hiding this comment.
Correlate retries by subtask, not role
When a plan contains two subtasks with the same role and both reach a terminal state, recovery events only carry the role to the reducer, so findReusableRoleRow can mark/re-key/drop the wrong row (the most recently inserted matching role) even though the coordinator emitted a distinct subtaskId. This makes the swarm dashboard inaccurate for duplicate-role plans; use the subtask identity or preserve a subtask-to-worker mapping instead of matching solely by role.
Useful? React with 👍 / 👎.
handleSwarmCommand called session.prompt directly, bypassing beginSessionRequest. streamingPhase therefore stayed 'idle' until the SDK turn.started event round-tripped back, leaving a startup window in which a fast follow-up message was dispatched as a second concurrent prompt and silently dropped by the core as agent_busy, and in which the UI showed no waiting state. Call beginSessionRequest() before prompting — flipping streamingPhase synchronously so the input gate closes immediately and the waiting pane shows — and failSessionRequest() on a prompt rejection, mirroring sendSkillActivation / handleInitCommand.
The swarm card finalized every is_error tool result as 'cancelled' with a success-toned bullet, and the dashboard suppresses the result body, so ordinary failures (planner produced no valid plan, synthesizer error) rendered as a clean "cancelled" with the real "Swarm failed: ..." reason hidden from the user. SwarmTool now distinguishes a genuine cancel (ctx.signal aborted) from an ordinary failure: on a real failure it emits a 'failed' swarm progress event carrying the reason before returning the error result. The TUI adds a terminal 'failed' phase (error bullet, ' · failed' tag, and a "✗ reason" body line); finalizeSwarmModelIfNeeded only forces 'cancelled' when the model is not already 'failed', so a genuine abort still shows 'cancelled'.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a17cfeee2d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| this.swarmModel = applySwarmEvent(this.swarmModel, { | ||
| t: 'done', | ||
| succeeded: this.swarmModel.doneCount, | ||
| failed: this.swarmModel.failedCount, | ||
| }); |
There was a problem hiding this comment.
Preserve completed swarm state on replay
When a completed /swarm turn is rendered from session history, replay only reconstructs the tool call/result from messages and does not replay the live tool.progress or subagent.* events that populated swarmModel; this fallback therefore finalizes an empty initial model as done with zero workers/counts. Since Swarm cards also suppress the normal tool result body, every resumed completed swarm shows an inaccurate 0 workers · 0✓ 0✗ dashboard instead of the actual worker outcome. Please either persist/replay the swarm progress state or fall back to rendering the result body when no worker rows were reconstructed.
Useful? React with 👍 / 👎.
| if (this.swarmModel !== undefined) { | ||
| this.buildSwarmBody(); | ||
| return; |
There was a problem hiding this comment.
Move Swarm rendering out of ToolCallComponent
apps/kimi-code/AGENTS.md says new tool-result display should extend components/messages/tool-renderers/registry.ts and the corresponding renderer, and should not stack branches inside ToolCallComponent; this branch adds the Swarm-specific dashboard branch directly in the central component. That makes future tool-specific UI continue to accumulate in this already-large class instead of the documented renderer path, so please move the Swarm display behind a dedicated renderer/component boundary.
Useful? React with 👍 / 👎.
| // dropped as agent_busy. | ||
| host.beginSessionRequest(); | ||
| try { | ||
| await session.prompt(buildSwarmPrompt(task)); |
There was a problem hiding this comment.
Show the swarm request in the transcript
This starts a real model turn but, unlike the normal send path, never appends the user's /swarm task to the live transcript before calling session.prompt. In a live session the user sees a Swarm tool card with no preceding user request, and after resume the replayed user message comes from the internal buildSwarmPrompt(...) wrapper instead of the command/task the user actually entered; add an explicit transcript entry for the swarm request before dispatching the prompt.
Useful? React with 👍 / 👎.
| typeof o['role'] !== 'string' || | ||
| typeof o['systemPrompt'] !== 'string' || | ||
| typeof o['prompt'] !== 'string' |
There was a problem hiding this comment.
Reject empty planner subtask fields
When the planner returns syntactically valid JSON but leaves role, systemPrompt, or prompt as an empty string, this parser accepts the plan instead of retrying, so the coordinator can spawn a swarm: worker with no role/instructions and synthesize arbitrary or useless output. Treat trimmed-empty required fields as invalid here, matching the stricter reviser parsing, so the existing planner retry handles malformed plans.
Useful? React with 👍 / 👎.
parsePlan accepted a syntactically valid plan whose role, systemPrompt, or prompt was an empty (or whitespace-only) string, only checking the type. Such a subtask spawns a swarm worker with no identity and no instructions — a wasted run with a blank dashboard row and a low-value contribution to synthesis. Reject empty/whitespace-only required fields so decompose's existing planner retry re-prompts for a valid plan, mirroring the non-empty validation parseReviseDecision already applies to the reviser's output.
/swarm drove a model turn via session.prompt(buildSwarmPrompt(task)) without ever putting the user's request in the transcript, so a live session showed a Swarm tool card with no preceding user line. Append a readable "/swarm <task>" user entry before starting the turn, mirroring the normal send path. Adds appendUserTranscriptEntry to the slash-command host for framed commands that prompt the model with an internal wrapper.
The swarm dashboard was stacked as branches inside ToolCallComponent, against the guideline that new tool-result display should live behind a dedicated renderer/component boundary. The static tool-renderer registry can't host it (the dashboard needs live, per-event updates on a single stable component — folding it in was how the original duplicate-card bug was fixed), so extract it into a dedicated top-level SwarmCard selected at tool-call-start, hosted by the same managed lifecycle. - New SwarmCard (sibling to ToolCallComponent) + a narrow ManagedToolCard interface the streaming-UI registry is typed against; shared helpers (str/formatTokens/SWARM_ACTIVITY_MAX_LENGTH) moved to tool-call-shared. - streaming-ui selects SwarmCard for name==='Swarm' at the single creation point; replay routes Swarm entries to SwarmCard too. - session-event-handler narrows the generic subagent path to ToolCallComponent after the swarm guard. - ToolCallComponent loses all swarm code (isSwarm() now returns false). Behavior-preserving: same single stable card, in-place mutation, static bullet, header from live args. Full suite green; swarm tests retargeted to SwarmCard.
Session resume reconstructs a swarm card from the tool call + final result, but the live tool.progress / subagent.* events that populate the dashboard are not replayed. A resumed completed swarm therefore finalized an empty model and rendered "0 workers · 0✓ 0✗" with the synthesized report — the actual deliverable — hidden entirely. When the card has no live worker data (nothing was replayed), render the result body instead of the empty dashboard and drop the misleading worker tail. A live whole-swarm failure still shows its reason via the existing '✗ <reason>' line; live runs (which always have worker data) are unaffected.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1461143743
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // dropped as agent_busy. | ||
| host.beginSessionRequest(); | ||
| try { | ||
| await session.prompt(buildSwarmPrompt(task)); |
There was a problem hiding this comment.
Persist the readable /swarm request for replay
Fresh evidence after the live-transcript fix: this still records the verbose buildSwarmPrompt(task) as a normal user prompt, and replay renders user-origin messages directly (apps/kimi-code/src/tui/controllers/session-replay.ts:254-256). The manual appendUserTranscriptEntry('/swarm ...') only affects the current in-memory transcript, so after resuming/exporting history the turn shows the internal “Use the Swarm tool…” wrapper instead of the command/task the user actually entered; send the framed prompt with a non-user/internal origin or persist a replayable readable request alongside it.
Useful? React with 👍 / 👎.
| ): ResolvedAgentProfile { | ||
| return { | ||
| name, | ||
| systemPrompt: () => override.systemPrompt, |
There was a problem hiding this comment.
Preserve runtime context in swarm override profiles
When profileOverride is used for swarm planner/worker/synthesizer agents, this renderer ignores the SystemPromptContext that normal profiles use to inject the current working directory, AGENTS.md instructions, available skills, OS/date, and other safety/tool-use guidance. For /swarm tasks that inspect a repository, workers therefore run with only planner-generated role text and can miss repo-specific constraints that ordinary Agent subagents receive; compose the override text with the base profile renderer or otherwise include the runtime context instead of replacing it entirely.
Useful? React with 👍 / 👎.
| * reviser). | ||
| */ | ||
| private async reviseSubtask(st: Subtask): Promise<ReviseDecision> { | ||
| const out = await this.deps.spawnSubagent({ |
There was a problem hiding this comment.
Drop the subtask when recovery revision fails
When a worker fails and the reviser subagent itself errors (for example a transient model failure or max_tokens before returning its JSON), this await propagates out of runWithRetries, causing the whole Swarm tool to fail and skip synthesis even if other workers already succeeded. Since malformed reviser output is already treated as a conservative drop, non-abort reviser exceptions should be handled the same way so one failed recovery decision does not discard the rest of the swarm results.
Useful? React with 👍 / 👎.
| if (typeof o['prompt'] !== 'string' || o['prompt'].length === 0) return null; | ||
| return { kind: 'regenerate', prompt: o['prompt'] }; | ||
| case 'reassign': { | ||
| if (typeof o['role'] !== 'string' || o['role'].length === 0) return null; | ||
| if (typeof o['systemPrompt'] !== 'string' || o['systemPrompt'].length === 0) return null; |
There was a problem hiding this comment.
Reject whitespace-only reviser fields
If the reviser returns valid JSON with "prompt":" " for regenerate, or whitespace-only role/systemPrompt for reassign, these checks accept it because they only test .length. The next wave then runs a worker with effectively blank instructions or a blank role/profile name, producing arbitrary output instead of dropping/retrying the malformed recovery decision; validate the trimmed strings here as is done for planner subtasks.
Useful? React with 👍 / 👎.
Problem
Broad, parallelizable tasks (multi-file analysis, multi-angle research) run today as a single sequential agent loop. The existing
Agentsubagent tool can spawn parallel subagents, but there is no built-in orchestration that decomposes a task, fans out heterogeneous role-specialized workers, and synthesizes their results into one answer.What changed
Adds a
/swarm <task>command (Phase 1) that runs a task as a self-directed agent swarm, client-side, on top of the existing subagent primitives:/swarmcommand (apps/kimi-code): sends a swarm-framed prompt viasession.prompt(), driving a new server-sideSwarmtool.Swarmtool (agent-core) runs a code-drivenSwarmCoordinator:swarm:<role>worker subagents concurrently (concurrency cap 4), each with a dynamically generated role — custom system prompt + a sanitized read-only tool subset via a newprofileOverrideonSessionSubagentHost.spawn;ctx.onUpdate(existing tool-progress channel).Swarmtool is registered only on non-sub agents (type !== 'sub'), and worker tool sets are filtered against a read-only allowlist (ALLOWED_WORKER_TOOLS), so a worker can never obtainSwarm/Agentand spawn a nested swarm.Scope: Phase 1 is a single-wave swarm with read-only workers. Failed subtasks are recorded and surfaced in synthesis (no auto-retry/reassign); multi-wave coordination, write-capable workers with approval, and a dedicated TUI panel are deferred to later phases.
Tests: unit coverage for the plan parser, concurrency helper, coordinator (plan/parallel/synthesize, planning retry, abort propagation, tool-allowlist sanitization), and the Swarm tool + command. Full suite green.