Skip to content

CS-10670: boxel-cli publishes tool definitions for factory consumption#4471

Open
FadhlanR wants to merge 3 commits intomainfrom
cs-10670-boxel-cli-publishes-tool-definitions-for-factory-consumption
Open

CS-10670: boxel-cli publishes tool definitions for factory consumption#4471
FadhlanR wants to merge 3 commits intomainfrom
cs-10670-boxel-cli-publishes-tool-definitions-for-factory-consumption

Conversation

@FadhlanR
Copy link
Copy Markdown
Contributor

@FadhlanR FadhlanR commented Apr 22, 2026

Summary

boxel-cli is now the single source of truth for the realm-operation tools the factory consumes. The factory calls getToolDefinitions(client, config) and spreads the result into its FactoryTool[] via adaptBoxelTool, which applies enforceRealmSafety at the seam. The factory's parallel subprocess dispatch (BOXEL_CLI_TOOLS + executeBoxelCli) is retired.

What boxel-cli exports

getToolDefinitions(client, config) returns 14 tool definitions backed by BoxelCLIClient methods. Each has name, description, JSON-Schema parameters, and a pre-bound execute function — consumers don't need to know about realm URLs or auth.

Tool CLI command Client method
realm_read_file boxel file read client.read
realm_write_file boxel file write client.write
realm_delete_file boxel file delete client.delete
realm_list_files boxel file list client.listFiles
realm_lint_file boxel file lint client.lint
realm_search boxel search client.search
realm_sync boxel realm sync client.sync
realm_push boxel realm push client.push
realm_pull boxel realm pull client.pull
realm_wait_for_ready boxel realm wait-for-ready client.waitForReady
realm_cancel_indexing boxel realm cancel-indexing client.cancelAllIndexingJobs
create_realm boxel realm create client.createRealm
read_transpiled boxel read-transpiled client.readTranspiled
run_command boxel run-command client.runCommand

Naming

verb_noun snake_case throughout, with a realm_ prefix on operations against an existing realm. create_realm puts the verb first because the realm doesn't yet exist to scope against. Tools without a target-workspace counterpart (read_transpiled, run_command) don't need the prefix.

The realm_ prefix on the file-op tools is load-bearing — it disambiguates from the factory-native read_file / write_file (target-workspace, no realm-url parameter), letting realm_read_file read as "the realm-scoped version of read_file".

What changed in the factory

  • Retired BOXEL_CLI_TOOLS, BOXEL_CLI_COMMAND_MAP, executeBoxelCli(), buildBoxelCliArgs(). ToolRegistry now owns only SCRIPT_TOOLS; the manifest category union narrowed to 'script'.
  • TARGET_REALM_BYPASS_TOOLS updated to realm_read_file / realm_write_file / realm_delete_file; safety guard logic unchanged.
  • Prompts (system.md, ticket-implement.md, ticket-iterate.md, bootstrap-implement.md), the software-factory-operations skill, test files, smoke scripts, and inline comments updated to the new tool names.

Plumbing

  • client.push() added to BoxelCLIClient; commands/realm/push.ts now exports a clean push() lib function alongside the commander wrapper, mirroring how pull() and sync() are factored.
  • BoxelCLIClient.runCommand delegates to the standalone runCommand() in commands/run-command.ts (matching client.lint, client.listFiles, etc.).
  • requireStringArg() helper moved from factory to boxel-cli as a shared utility (still publicly exported).
  • packages/software-factory/package.json: removed orphaned volta.extends block (root has no Volta config) so pnpm test runs without a Volta error.

Test plan

  • pnpm --filter @cardstack/boxel-cli test — 169/169 unit tests passing, including 24 in tool-definitions.test.ts (covers all 14 tools + sync/push/pull builders). Integration tests skipped (require Synapse).
  • pnpm --filter @cardstack/boxel-cli lint:types — clean.
  • pnpm --filter @cardstack/software-factory test — 449/450 passing. The one failure is port-allocator > IPv4 holder blocks dual-stack bind, an OS-level dual-stack networking flake unrelated to this work.
  • Final grep — zero references to old kebab tool names (realm-read, realm-write, realm-delete, realm-search, realm-create, fetch_transpiled_module, list_files, lint_file, wait_for_ready, cancel_indexing) or retired symbols (BOXEL_CLI_TOOLS, executeBoxelCli, BOXEL_CLI_COMMAND_MAP, buildBoxelCliArgs) anywhere in source/tests/prompts/skills.
  • CI green

Linear

Closes CS-10670

🤖 Generated with Claude Code

@habdelra
Copy link
Copy Markdown
Contributor

we should also incorporate the lint tool from #4479 as well

@FadhlanR FadhlanR force-pushed the cs-10670-boxel-cli-publishes-tool-definitions-for-factory-consumption branch 4 times, most recently from fd93e64 to 57f2058 Compare April 29, 2026 06:16
@FadhlanR FadhlanR marked this pull request as ready for review April 29, 2026 06:30
@FadhlanR FadhlanR requested review from a team, habdelra and jurgenwerk April 29, 2026 06:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57f2058de8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/software-factory/scripts/smoke-tests/factory-tools-smoke.ts Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes @cardstack/boxel-cli the single source of truth for realm-operation tool definitions consumed by @cardstack/software-factory, replacing the factory’s prior registry/executor dispatch for realm-api + boxel-cli tools and standardizing tool naming (snake_case, realm_ prefix).

Changes:

  • Added getToolDefinitions(client, config) to boxel-cli to publish 14 tool definitions (name/description/JSON-schema/execute) backed by BoxelCLIClient.
  • Updated software-factory to build/adopt those boxel-cli tools via adaptBoxelTool (wrapping with enforceRealmSafety) and narrowed ToolRegistry/manifests to script tools only.
  • Updated tests, prompts, and docs to use the new tool names and wiring; removed obsolete factory boxel-cli subprocess plumbing.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/software-factory/tests/factory-tool-registry.test.ts Updates registry expectations now that only script tools are registered.
packages/software-factory/tests/factory-tool-executor.test.ts Refactors tests to cover script-tool executor behavior and safety wrapping for adopted boxel-cli tools.
packages/software-factory/tests/factory-tool-executor.spec.ts Updates Playwright tests to execute realm_* tools via adopted boxel-cli tool definitions rather than executor dispatch.
packages/software-factory/tests/factory-tool-executor.integration.test.ts Updates integration tests to validate boxel-cli tool wire shape via getToolDefinitions.
packages/software-factory/tests/factory-tool-builder.test.ts Updates builder tests to expect realm tools from boxel-cli tool definitions and new tool names.
packages/software-factory/src/harness/database.ts Comment update for renamed realm_search tool.
packages/software-factory/src/factory-tool-registry.ts Removes boxel-cli/realm-api manifests; registry now only owns script tool manifests.
packages/software-factory/src/factory-tool-executor.ts Extracts realm safety guard into exported enforceRealmSafety + adds buildRealmSafetyConfig; executor dispatch narrowed to script tools.
packages/software-factory/src/factory-tool-builder.ts Adopts all boxel-cli tool definitions into agent tool list via adaptBoxelTool + safety wrapping; removes factory-local realm/search/run-command wrappers.
packages/software-factory/src/factory-issue-loop-wiring.ts Updates wiring to use script-only ToolRegistry and buildRealmSafetyConfig.
packages/software-factory/src/factory-agent/types.ts Narrows ToolManifest.category to 'script'.
packages/software-factory/scripts/smoke-tests/factory-tools-smoke.ts Partially renames tools, but still uses executor/registry assumptions that appear outdated.
packages/software-factory/prompts/ticket-iterate.md Updates guidance to realm_search with explicit realm-url.
packages/software-factory/prompts/ticket-implement.md Updates guidance to realm_search with explicit realm-url.
packages/software-factory/prompts/system.md Updates system instructions to realm_search with explicit realm-url.
packages/software-factory/prompts/bootstrap-implement.md Updates bootstrap instructions to realm_search with explicit realm-url.
packages/software-factory/package.json Removes orphaned volta.extends config.
packages/software-factory/.agents/skills/software-factory-operations/SKILL.md Updates tool names but has a parameter-signature mismatch for realm_search.
packages/boxel-cli/tests/lib/tool-definitions.test.ts Adds unit tests for getToolDefinitions (14 tools) and option mapping.
packages/boxel-cli/src/lib/tool-definitions.ts Adds published boxel-cli tool definitions and requireStringArg helper.
packages/boxel-cli/src/lib/boxel-cli-client.ts Adds push() wrapper and delegates runCommand() to standalone command implementation.
packages/boxel-cli/src/commands/run-command.ts Makes runCommand() return structured errors (vs throw) and supports overriding realm server URL.
packages/boxel-cli/src/commands/realm/push.ts Factors push into reusable push() returning PushResult and keeps CLI wrapper behavior.
packages/boxel-cli/api.ts Exports getToolDefinitions, requireStringArg, and related types from the public API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/software-factory/scripts/smoke-tests/factory-tools-smoke.ts Outdated
Comment thread packages/software-factory/scripts/smoke-tests/factory-tools-smoke.ts Outdated
Comment thread packages/software-factory/scripts/smoke-tests/factory-tools-smoke.ts Outdated
Comment thread packages/boxel-cli/src/lib/tool-definitions.ts Outdated
Comment thread packages/software-factory/src/factory-tool-executor.ts Outdated
Comment thread packages/software-factory/.agents/skills/software-factory-operations/SKILL.md Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would have expected buildReadFileTool and buildWriteFileTool also to be removed. i think the boxel-cli has taken over this completly and hence needs to define the tool now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two types of write_file and read_file tools. The write_file and read_file tools operate on the local filesystem, while realm_write_file and realm_read_file handle reading and writing files to a realm via an endpoint. The buildWriteFileTool and buildReadFileTool that remain here are for local filesystem operations, while the realm-based implementations have been migrated to the Boxel CLI.

Copy link
Copy Markdown
Contributor

@habdelra habdelra Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need a tool tho to tell teh agent how to read and write a file on the local file system--every agent should know how to do that. the best thing we can do here is to get out of teh agent's way. it has a vast experience in doing this from its training data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see we have file read and file write commands too in teh boxel-cli--that is unnecessary, the agent should know how to read and write a file from the local file system. you definitely see this when you use claude code--it's all the cat command that it's issuing as well as just piping into file descriptors that it does. there is no need to tell the agent how to do this, especially forcing it to use node. what it does is actually more efficient than any tool we can give it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you that we don't need read_file and write_file as tools, the agent handles local file ops better with its native tools, so I removed them from factory-tool-builder. But the file read and write in boxel-cli are still needed since they're not for local files, they hit the realm server over HTTP with the Matrix JWT to read/write files in non-target realms (scratch, source, catalog) that aren't synced locally, and the agent can't do that natively without realm credentials. The TARGET_REALM_BYPASS_TOOLS guard already blocks them from the target realm so the agent only uses them for non-target work. Happy to rename if realm_*_file feels confusing.

Copy link
Copy Markdown
Contributor

@habdelra habdelra Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we have a realm file read and realm file write for that purpose? we should be consistent with the name, the naming consistency matters to the agent

Copy link
Copy Markdown
Contributor

@jurgenwerk jurgenwerk Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need a tool tho to tell teh agent how to read and write a file on the local file system--every agent should know how to do that.

But what about OpenRouter? I think this agent has no way to touch the filesystem without providing it tools

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? i would think that all the agents used via open router have basic file system manipulation knowledge

FadhlanR added a commit that referenced this pull request Apr 30, 2026
- Remove workspace read_file/write_file factory tools (Hassan); agent
  uses its native filesystem tools on the workspace dir instead.
- Fix requireStringArg to return raw.trim() (Copilot).
- Tighten validateRealmTarget: parse with URL, require origin equality,
  then path-prefix-match within that origin — blocks subdomain-suffix
  SSRF like realms.example.test.evil.com (Copilot).
- Rewire factory-tools-smoke.ts to invoke realm tools via
  getToolDefinitions + adaptBoxelTool instead of ToolExecutor.execute,
  matching the runtime path; registry asserts script-only (Codex, Copilot).
- Update prompts and skills: realm_search now documented with required
  realm-url; target-realm I/O guidance points at native filesystem tools
  (Copilot).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FadhlanR and others added 2 commits April 30, 2026 13:45
boxel-cli is now the single source of truth for the realm-operation tools
the factory needs. The factory imports `getToolDefinitions(client, config)`
and spreads the result into its `FactoryTool[]` via `adaptBoxelTool`,
which applies `enforceRealmSafety` at the seam.

What boxel-cli exports
- `getToolDefinitions(client, config)` returns 14 tool definitions backed
  by `BoxelCLIClient` methods.
- Tool names use `verb_noun` snake_case throughout, with a `realm_` prefix
  on operations against an existing realm:
    realm_read_file, realm_write_file, realm_delete_file,
    realm_search, realm_sync, realm_push, realm_pull,
    realm_list_files, realm_lint_file,
    realm_wait_for_ready, realm_cancel_indexing
  Plus `create_realm` (verb leads — the realm doesn't exist yet) and
  `read_transpiled` / `run_command` (no target-workspace counterpart, so
  no prefix needed).
- `client.push()` added to BoxelCLIClient; `commands/realm/push.ts`
  refactored to expose a clean `push()` lib function alongside the
  commander wrapper, mirroring how `pull()` and `sync()` are factored.

Factory cleanup
- Retired `BOXEL_CLI_TOOLS`, `BOXEL_CLI_COMMAND_MAP`, `executeBoxelCli()`,
  and `buildBoxelCliArgs()` from the factory. `ToolRegistry` now owns
  only `SCRIPT_TOOLS`; the manifest category union narrowed to `'script'`.
- All prompts, the `software-factory-operations` skill, test files,
  smoke scripts, and inline comments updated to the new tool names.
- `packages/software-factory/package.json`: removed orphaned
  `volta.extends` block (root has no Volta config) so `pnpm test` runs
  without a Volta error. Added `test:playwright:shard` script.

Realm safety wiring
- `RealmSafetyConfig.sourceRealmUrl` + `allowedRealmPrefixes` were
  declared but never populated in production wiring. Added
  `buildRealmSafetyConfig({ targetRealmUrl, realmServerUrl })` next to
  `enforceRealmSafety` that derives both via the existing
  `sourceRealmURLFor()` helper. Wired into `factory-issue-loop-wiring.ts`
  (for `ToolExecutor`) and `factory-tool-builder.ts` (for adapted
  boxel-cli tools). Source-realm rejection is now active in production;
  agent can target any realm hosted on the same realm server modulo the
  source-realm rejection that fires first.
- Dropped `sourceRealmUrl?` / `allowedRealmPrefixes?` from
  `ToolBuilderConfig` — they're derivable, not config inputs. Tests +
  smoke continue constructing `RealmSafetyConfig` directly with explicit
  values for scenario coverage.
- `TARGET_REALM_BYPASS_TOOLS` updated to the new file-op names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove workspace read_file/write_file factory tools (Hassan); agent
  uses its native filesystem tools on the workspace dir instead.
- Fix requireStringArg to return raw.trim() (Copilot).
- Tighten validateRealmTarget: parse with URL, require origin equality,
  then path-prefix-match within that origin — blocks subdomain-suffix
  SSRF like realms.example.test.evil.com (Copilot).
- Rewire factory-tools-smoke.ts to invoke realm tools via
  getToolDefinitions + adaptBoxelTool instead of ToolExecutor.execute,
  matching the runtime path; registry asserts script-only (Codex, Copilot).
- Update prompts and skills: realm_search now documented with required
  realm-url; target-realm I/O guidance points at native filesystem tools
  (Copilot).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@FadhlanR FadhlanR force-pushed the cs-10670-boxel-cli-publishes-tool-definitions-for-factory-consumption branch from 3da54c5 to f104657 Compare April 30, 2026 06:47
Earlier review-feedback commit framed the guidance as "the agent does
not invoke boxel sync / boxel push / boxel pull" — too broad. The
orchestration loop only owns target-realm sync; the agent has its own
realm tools available for any other realm operation. Reframe as
enabling instead of prohibitive: explain the workspace shortcut for
target-realm files and trust the published tool descriptions for
everything else (no enumeration of tool names in prose).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jurgenwerk
Copy link
Copy Markdown
Contributor

jurgenwerk commented Apr 30, 2026

This overlaps significantly with my work here #4578, I think we need to discuss this together with @habdelra.

My assumption was that we are retiring many of the tools listed here and instead make the agent use its native commands + boxel CLI directly. That's where my PR went: the file-I/O wrappers (read_file, write_file, search_realm, fetch_transpiled_module) and the compound mutations (update_project, update_issue, create_knowledge, create_catalog_spec, add_comment) are gone, along with realm-read / realm-write /
realm-delete. Claude uses the SDK's built-in Read / Write / Edit / Glob / Grep / Bash (npx boxel search, npx boxel pull);

factory-tool-executor.ts and factory-tool-registry.ts are deleted because the registered-tool dispatch ended up unused on all paths.

This PR moves the tool definitions into boxel-cli for the factory to consume, keeping the dispatch alive, just relocating it.

So as a general direction I am wondering:

  1. Are we retiring the wrappers, or relocating them?
  2. If relocating: who's the consumer? In my PR, factory:go calls native tools / boxel CLI directly now — would published definitions ship for an external agent / IDE integration, or does the factory still need them?

@FadhlanR
Copy link
Copy Markdown
Contributor Author

This overlaps significantly with my work here #4578, I think we need to discuss this together with @habdelra.

My assumption was that we are retiring many of the tools listed here and instead make the agent use its native commands + boxel CLI directly. That's where my PR went: the file-I/O wrappers (read_file, write_file, search_realm, fetch_transpiled_module) and the compound mutations (update_project, update_issue, create_knowledge, create_catalog_spec, add_comment) are gone, along with realm-read / realm-write /

realm-delete. Claude uses the SDK's built-in Read / Write / Edit / Glob / Grep / Bash (npx boxel search, npx boxel pull);

factory-tool-executor.ts and factory-tool-registry.ts are deleted because the registered-tool dispatch ended up unused on all paths.

This PR moves the tool definitions into boxel-cli for the factory to consume, keeping the dispatch alive, just relocating it.

So as a general direction I am wondering:

  1. Are we retiring the wrappers, or relocating them?

  2. If relocating: who's the consumer? In my PR, factory:go calls native tools / boxel CLI directly now — would published definitions ship for an external agent / IDE integration, or does the factory still need them?

If factory can call boxel CLI directly, I don't think we need to wrap boxel cli commands in tools definition

@jurgenwerk
Copy link
Copy Markdown
Contributor

jurgenwerk commented Apr 30, 2026

Actually, I just realized that if we want continue supporting OpenRouter (I think that would be appropriate), then we have to maintain tool definitions, even for reading and writing a file, because OpenRouter doesn't ship any built-in tools, every model reached through it sees only what tools we send and only knows how to dispatch to JS functions we've written. There's no SDK doing it under the hood like there is with Claude agent.

So I think landing this PR first makes sense, and then in my PR, I can make sure claude uses native tools (and not FactoryTool tools).

This is the minimum list of tools openrouter needs (I verified this with a factory test run):
read_file, write_file, edit_file, search_realms, run_command, run_lint, run_tests, run_evaluate, run_parse, run_instantiate, signal_done, request_clarification

@habdelra
Copy link
Copy Markdown
Contributor

Actually, I just realized that if we want continue supporting OpenRouter (I think that would be appropriate), then we have to maintain tool definitions, even for reading and writing a file, because OpenRouter doesn't ship any built-in tools, every model reached through it sees only what tools we send and only knows how to dispatch to JS functions we've written. There's no SDK doing it under the hood like there is with Claude agent.

So I think landing this PR first makes sense, and then in my PR, I can make sure claude uses native tools (and not FactoryTool tools).

This is the minimum list of tools openrouter needs (I verified this with a factory test run): read_file, write_file, edit_file, search_realms, run_command, run_lint, run_tests, run_evaluate, run_parse, run_instantiate, signal_done, request_clarification

so open router cannot manipulate a file system on its own? that seems odd. maybe worth a office hours discussion? but yes, I agree @jurgenwerk definitely we should be retriring wrappers and leaning on skills. the less tools we have the better IMHO.

@jurgenwerk
Copy link
Copy Markdown
Contributor

jurgenwerk commented Apr 30, 2026

so open router cannot manipulate a file system on its own? that seems odd

Yes, Claude keeps telling me about that inability, chat too: https://chatgpt.com/share/e/69f35079-30d4-800b-9c41-e3030beecd9f

I also tried to run the factory with openrouter agent, with read/write file commented out in factory-tool-builder.ts, and it got stuck for a while but interestingly enough, it eventually figured it out how to do it using commands, run_command to invoke @cardstack/boxel-host/commands/read-text-file/default for example. But this is very expensive and inefficient (if we don't adjust the skills)

@habdelra
Copy link
Copy Markdown
Contributor

so open router cannot manipulate a file system on its own? that seems odd

Yes, Claude keeps telling me about that inability, chat too: https://chatgpt.com/share/e/69f35079-30d4-800b-9c41-e3030beecd9f

I also tried to run the factory with openrouter agent, with read/write file commented out in factory-tool-builder.ts, and it got stuck for a while but interestingly enough, it eventually figured it out how to do it using commands, run_command to invoke @cardstack/boxel-host/commands/read-text-file/default for example. But this is very expensive and inefficient (if we don't adjust the skills)

thats fascinating. i guess the bigger concern i have is how important is open router to the software factory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants