Skip to content

feat: add /swarm parallel agent-swarm orchestration#208

Open
RealKai42 wants to merge 31 commits into
mainfrom
kaiyi/karachi
Open

feat: add /swarm parallel agent-swarm orchestration#208
RealKai42 wants to merge 31 commits into
mainfrom
kaiyi/karachi

Conversation

@RealKai42
Copy link
Copy Markdown
Collaborator

Problem

Broad, parallelizable tasks (multi-file analysis, multi-angle research) run today as a single sequential agent loop. The existing Agent subagent tool can spawn parallel subagents, but there is no built-in orchestration that decomposes a task, fans out heterogeneous role-specialized workers, and synthesizes their results into one answer.

What changed

Adds a /swarm <task> command (Phase 1) that runs a task as a self-directed agent swarm, client-side, on top of the existing subagent primitives:

  • /swarm command (apps/kimi-code): sends a swarm-framed prompt via session.prompt(), driving a new server-side Swarm tool.
  • Swarm tool (agent-core) runs a code-driven SwarmCoordinator:
    1. spawns a planner subagent that emits a JSON decomposition plan (parsed + one retry);
    2. fans out swarm:<role> worker subagents concurrently (concurrency cap 4), each with a dynamically generated role — custom system prompt + a sanitized read-only tool subset via a new profileOverride on SessionSubagentHost.spawn;
    3. spawns a synthesizer subagent to merge worker outputs into the final answer.
  • Progress streams via ctx.onUpdate (existing tool-progress channel).
  • Recursion guard: the Swarm tool is registered only on non-sub agents (type !== 'sub'), and worker tool sets are filtered against a read-only allowlist (ALLOWED_WORKER_TOOLS), so a worker can never obtain Swarm/Agent and spawn a nested swarm.

Scope: Phase 1 is a single-wave swarm with read-only workers. Failed subtasks are recorded and surfaced in synthesis (no auto-retry/reassign); multi-wave coordination, write-capable workers with approval, and a dedicated TUI panel are deferred to later phases.

Tests: unit coverage for the plan parser, concurrency helper, coordinator (plan/parallel/synthesize, planning retry, abort propagation, tool-allowlist sanitization), and the Swarm tool + command. Full suite green.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 29, 2026

🦋 Changeset detected

Latest commit: 1461143

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@moonshot-ai/agent-core Minor
@moonshot-ai/kimi-code Minor
@moonshot-ai/migration-legacy Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 29, 2026

pnpm dlx https://pkg.pr.new/@moonshot-ai/kimi-code@1461143
npx https://pkg.pr.new/@moonshot-ai/kimi-code@1461143

commit: 1461143

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc5e4bf787

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if (typeof parsed !== 'object' || parsed === null) return null;

const subtasksRaw = (parsed as { subtasks?: unknown }).subtasks;
if (!Array.isArray(subtasksRaw) || subtasksRaw.length === 0) return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce the planner's subtask cap

When the planner returns valid JSON with more than the prompted maximum of 6 subtasks, this accepts the entire array; SwarmCoordinator.runWave then iterates every entry and spawns a subagent for each one, only limiting concurrent workers to 4. In the common failure mode where the LLM ignores the cap or emits a large accidental list, /swarm can launch dozens of subagents and burn substantial time/tokens instead of retrying or rejecting the invalid plan. Please validate subtasksRaw.length <= 6 here (or truncate deliberately) before spawning workers.

Useful? React with 👍 / 👎.

Comment thread packages/agent-core/src/tools/builtin/collaboration/swarm.ts Outdated
return;
}
try {
await session.prompt(buildSwarmPrompt(task));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle sessions whose active tools lack Swarm

This directly prompts the current session to call Swarm, but resumed sessions created before this commit replay their old tools.set_active_tools record from the wire, so their active tool list does not include the newly added Swarm entry from agent.yaml. In that context /swarm <task> is accepted by the TUI but the model is asked to use a tool that is not exposed, so the command fails or devolves into normal chat; migrate old agent tool lists or check tool availability before sending this framed prompt.

Useful? React with 👍 / 👎.

RealKai42 added 18 commits May 29, 2026 17:36
The stall-detection repeat key joined the tool name and canonical args
with a literal NUL (0x00) separator. The control byte caused git to
classify stall-hook.ts as binary, so diffs, blame, and code review on
the file were opaque — which prevented confirming the test history for
this feature. Replace the NUL with a normal space (tool names are
identifiers and never contain spaces, so keys stay collision-free) so
the file is plain UTF-8 text and remains reviewable.

Behavior is unchanged: the key still uniquely combines tool name and
canonical args. Verified by reverting the hook to a no-op stub to show
the three stall-detection test files go red (the discriminating block,
canonical-key, e2e turn-abort, and worker-stall-translation cases all
fail), then restoring the real implementation to confirm they pass —
the failing-first the prior atomic commit never recorded.

Full suite: 5049 passed / 25 skipped; make typecheck clean.
The summary-continuation pass re-prompted any subagent whose first
summary was under 200 chars to "expand" it, then read back the
follow-up turn — replacing the original output rather than appending.

For swarm's structured-output subagents this was harmful: a reviser's
compact decision JSON (e.g. {"kind":"retry"}) is always under the
threshold, so the expand turn always fired and could replace the JSON
with prose, silently degrading the recovery loop into conservative
drops. It also taxed every short-but-complete handoff with an extra
turn.

Remove the heuristic entirely so a subagent's first summary is returned
as-is. The max-tokens truncation guard is unaffected.
@RealKai42
Copy link
Copy Markdown
Collaborator Author

@codex

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc9176b3d0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +24 to +25
const subtasksRaw = (parsed as { subtasks?: unknown }).subtasks;
if (!Array.isArray(subtasksRaw) || subtasksRaw.length === 0) return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce the advertised subtask cap

The planner prompt says to keep the plan to at most 6 subtasks, but this parser accepts any non-empty subtasks array. If the model returns dozens or hundreds of items, runWithRetries will execute every one of them (bounded only by concurrency), which can turn a single /swarm call into unexpected token/tool spend and a very long run instead of retrying/rejecting the invalid plan.

Useful? React with 👍 / 👎.

Comment on lines +26 to +27
try {
await session.prompt(buildSwarmPrompt(task));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route /swarm through the normal send lifecycle

This calls session.prompt directly, so the TUI never runs the normal sendMessageInternal setup (beginSessionRequest, streaming state, transcript entry, and queue handling). During the initial model latency before any SDK event arrives, the app still considers itself idle, so another user input or idle-only slash command can be accepted and race with the swarm turn instead of being blocked/queued like a normal prompt.

Useful? React with 👍 / 👎.

await runChildTurnToCompletion(child, options.signal);
result = lastAssistantText(child);
}
const result = lastAssistantText(child);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve continuation for regular subagents

This now returns the first assistant message for every subagent, removing the previous follow-up that expanded summaries shorter than 200 characters. That may be useful for swarm planner/reviser JSON, but it also changes normal Agent/explore subagents: a terse answer such as “Done” is handed back to the parent without the bounded expansion turn, leaving the parent under-informed. Scope the raw-result behavior to the swarm/profileOverride path rather than all subagents.

Useful? React with 👍 / 👎.

Comment on lines +684 to +685
if (result.is_error === true) {
this.swarmModel = applySwarmEvent(this.swarmModel, { t: 'cancelled' });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve non-cancel swarm errors

This treats every Swarm error result as cancelled, but SwarmTool returns isError for ordinary failures too, such as planner JSON failures or synthesizer errors. Because swarm cards also skip rendering the tool result body, those failures are displayed as a cancelled dashboard with the actual Swarm failed: ... message hidden from the user; only genuine abort/cancel errors should take this path.

Useful? React with 👍 / 👎.

// visible but mark it retrying (an in-flight, uncounted state) so the
// re-spawn can collapse onto it. Carries no subagent id, so we match by
// role against the most recent terminal/retrying row.
const prior = findReusableRoleRow(model.workers, event.role);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Correlate retries by subtask, not role

When a plan contains two subtasks with the same role and both reach a terminal state, recovery events only carry the role to the reducer, so findReusableRoleRow can mark/re-key/drop the wrong row (the most recently inserted matching role) even though the coordinator emitted a distinct subtaskId. This makes the swarm dashboard inaccurate for duplicate-role plans; use the subtask identity or preserve a subtask-to-worker mapping instead of matching solely by role.

Useful? React with 👍 / 👎.

RealKai42 added 2 commits May 30, 2026 00:51
handleSwarmCommand called session.prompt directly, bypassing
beginSessionRequest. streamingPhase therefore stayed 'idle' until the
SDK turn.started event round-tripped back, leaving a startup window in
which a fast follow-up message was dispatched as a second concurrent
prompt and silently dropped by the core as agent_busy, and in which the
UI showed no waiting state.

Call beginSessionRequest() before prompting — flipping streamingPhase
synchronously so the input gate closes immediately and the waiting pane
shows — and failSessionRequest() on a prompt rejection, mirroring
sendSkillActivation / handleInitCommand.
The swarm card finalized every is_error tool result as 'cancelled' with
a success-toned bullet, and the dashboard suppresses the result body, so
ordinary failures (planner produced no valid plan, synthesizer error)
rendered as a clean "cancelled" with the real "Swarm failed: ..." reason
hidden from the user.

SwarmTool now distinguishes a genuine cancel (ctx.signal aborted) from an
ordinary failure: on a real failure it emits a 'failed' swarm progress
event carrying the reason before returning the error result. The TUI adds
a terminal 'failed' phase (error bullet, ' · failed' tag, and a "✗ reason"
body line); finalizeSwarmModelIfNeeded only forces 'cancelled' when the
model is not already 'failed', so a genuine abort still shows 'cancelled'.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a17cfeee2d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +693 to +697
this.swarmModel = applySwarmEvent(this.swarmModel, {
t: 'done',
succeeded: this.swarmModel.doneCount,
failed: this.swarmModel.failedCount,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve completed swarm state on replay

When a completed /swarm turn is rendered from session history, replay only reconstructs the tool call/result from messages and does not replay the live tool.progress or subagent.* events that populated swarmModel; this fallback therefore finalizes an empty initial model as done with zero workers/counts. Since Swarm cards also suppress the normal tool result body, every resumed completed swarm shows an inaccurate 0 workers · 0✓ 0✗ dashboard instead of the actual worker outcome. Please either persist/replay the swarm progress state or fall back to rendering the result body when no worker rows were reconstructed.

Useful? React with 👍 / 👎.

Comment on lines +1748 to +1750
if (this.swarmModel !== undefined) {
this.buildSwarmBody();
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Move Swarm rendering out of ToolCallComponent

apps/kimi-code/AGENTS.md says new tool-result display should extend components/messages/tool-renderers/registry.ts and the corresponding renderer, and should not stack branches inside ToolCallComponent; this branch adds the Swarm-specific dashboard branch directly in the central component. That makes future tool-specific UI continue to accumulate in this already-large class instead of the documented renderer path, so please move the Swarm display behind a dedicated renderer/component boundary.

Useful? React with 👍 / 👎.

// dropped as agent_busy.
host.beginSessionRequest();
try {
await session.prompt(buildSwarmPrompt(task));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Show the swarm request in the transcript

This starts a real model turn but, unlike the normal send path, never appends the user's /swarm task to the live transcript before calling session.prompt. In a live session the user sees a Swarm tool card with no preceding user request, and after resume the replayed user message comes from the internal buildSwarmPrompt(...) wrapper instead of the command/task the user actually entered; add an explicit transcript entry for the swarm request before dispatching the prompt.

Useful? React with 👍 / 👎.

Comment on lines +33 to +35
typeof o['role'] !== 'string' ||
typeof o['systemPrompt'] !== 'string' ||
typeof o['prompt'] !== 'string'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject empty planner subtask fields

When the planner returns syntactically valid JSON but leaves role, systemPrompt, or prompt as an empty string, this parser accepts the plan instead of retrying, so the coordinator can spawn a swarm: worker with no role/instructions and synthesize arbitrary or useless output. Treat trimmed-empty required fields as invalid here, matching the stricter reviser parsing, so the existing planner retry handles malformed plans.

Useful? React with 👍 / 👎.

RealKai42 added 4 commits May 30, 2026 14:01
parsePlan accepted a syntactically valid plan whose role, systemPrompt,
or prompt was an empty (or whitespace-only) string, only checking the
type. Such a subtask spawns a swarm worker with no identity and no
instructions — a wasted run with a blank dashboard row and a low-value
contribution to synthesis.

Reject empty/whitespace-only required fields so decompose's existing
planner retry re-prompts for a valid plan, mirroring the non-empty
validation parseReviseDecision already applies to the reviser's output.
/swarm drove a model turn via session.prompt(buildSwarmPrompt(task))
without ever putting the user's request in the transcript, so a live
session showed a Swarm tool card with no preceding user line.

Append a readable "/swarm <task>" user entry before starting the turn,
mirroring the normal send path. Adds appendUserTranscriptEntry to the
slash-command host for framed commands that prompt the model with an
internal wrapper.
The swarm dashboard was stacked as branches inside ToolCallComponent,
against the guideline that new tool-result display should live behind a
dedicated renderer/component boundary. The static tool-renderer registry
can't host it (the dashboard needs live, per-event updates on a single
stable component — folding it in was how the original duplicate-card bug
was fixed), so extract it into a dedicated top-level SwarmCard selected
at tool-call-start, hosted by the same managed lifecycle.

- New SwarmCard (sibling to ToolCallComponent) + a narrow ManagedToolCard
  interface the streaming-UI registry is typed against; shared helpers
  (str/formatTokens/SWARM_ACTIVITY_MAX_LENGTH) moved to tool-call-shared.
- streaming-ui selects SwarmCard for name==='Swarm' at the single creation
  point; replay routes Swarm entries to SwarmCard too.
- session-event-handler narrows the generic subagent path to
  ToolCallComponent after the swarm guard.
- ToolCallComponent loses all swarm code (isSwarm() now returns false).

Behavior-preserving: same single stable card, in-place mutation, static
bullet, header from live args. Full suite green; swarm tests retargeted
to SwarmCard.
Session resume reconstructs a swarm card from the tool call + final
result, but the live tool.progress / subagent.* events that populate the
dashboard are not replayed. A resumed completed swarm therefore finalized
an empty model and rendered "0 workers · 0✓ 0✗" with the synthesized
report — the actual deliverable — hidden entirely.

When the card has no live worker data (nothing was replayed), render the
result body instead of the empty dashboard and drop the misleading worker
tail. A live whole-swarm failure still shows its reason via the existing
'✗ <reason>' line; live runs (which always have worker data) are
unaffected.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1461143743

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// dropped as agent_busy.
host.beginSessionRequest();
try {
await session.prompt(buildSwarmPrompt(task));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist the readable /swarm request for replay

Fresh evidence after the live-transcript fix: this still records the verbose buildSwarmPrompt(task) as a normal user prompt, and replay renders user-origin messages directly (apps/kimi-code/src/tui/controllers/session-replay.ts:254-256). The manual appendUserTranscriptEntry('/swarm ...') only affects the current in-memory transcript, so after resuming/exporting history the turn shows the internal “Use the Swarm tool…” wrapper instead of the command/task the user actually entered; send the framed prompt with a non-user/internal origin or persist a replayable readable request alongside it.

Useful? React with 👍 / 👎.

): ResolvedAgentProfile {
return {
name,
systemPrompt: () => override.systemPrompt,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve runtime context in swarm override profiles

When profileOverride is used for swarm planner/worker/synthesizer agents, this renderer ignores the SystemPromptContext that normal profiles use to inject the current working directory, AGENTS.md instructions, available skills, OS/date, and other safety/tool-use guidance. For /swarm tasks that inspect a repository, workers therefore run with only planner-generated role text and can miss repo-specific constraints that ordinary Agent subagents receive; compose the override text with the base profile renderer or otherwise include the runtime context instead of replacing it entirely.

Useful? React with 👍 / 👎.

* reviser).
*/
private async reviseSubtask(st: Subtask): Promise<ReviseDecision> {
const out = await this.deps.spawnSubagent({
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Drop the subtask when recovery revision fails

When a worker fails and the reviser subagent itself errors (for example a transient model failure or max_tokens before returning its JSON), this await propagates out of runWithRetries, causing the whole Swarm tool to fail and skip synthesis even if other workers already succeeded. Since malformed reviser output is already treated as a conservative drop, non-abort reviser exceptions should be handled the same way so one failed recovery decision does not discard the rest of the swarm results.

Useful? React with 👍 / 👎.

Comment on lines +87 to +91
if (typeof o['prompt'] !== 'string' || o['prompt'].length === 0) return null;
return { kind: 'regenerate', prompt: o['prompt'] };
case 'reassign': {
if (typeof o['role'] !== 'string' || o['role'].length === 0) return null;
if (typeof o['systemPrompt'] !== 'string' || o['systemPrompt'].length === 0) return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject whitespace-only reviser fields

If the reviser returns valid JSON with "prompt":" " for regenerate, or whitespace-only role/systemPrompt for reassign, these checks accept it because they only test .length. The next wave then runs a worker with effectively blank instructions or a blank role/profile name, producing arbitrary output instead of dropping/retrying the malformed recovery decision; validate the trimmed strings here as is done for planner subtasks.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant