CS-10670: boxel-cli publishes tool definitions for factory consumption#4471
CS-10670: boxel-cli publishes tool definitions for factory consumption#4471
Conversation
|
we should also incorporate the lint tool from #4479 as well |
fd93e64 to
57f2058
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 57f2058de8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR makes @cardstack/boxel-cli the single source of truth for realm-operation tool definitions consumed by @cardstack/software-factory, replacing the factory’s prior registry/executor dispatch for realm-api + boxel-cli tools and standardizing tool naming (snake_case, realm_ prefix).
Changes:
- Added
getToolDefinitions(client, config)to boxel-cli to publish 14 tool definitions (name/description/JSON-schema/execute) backed byBoxelCLIClient. - Updated software-factory to build/adopt those boxel-cli tools via
adaptBoxelTool(wrapping withenforceRealmSafety) and narrowedToolRegistry/manifests to script tools only. - Updated tests, prompts, and docs to use the new tool names and wiring; removed obsolete factory boxel-cli subprocess plumbing.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/software-factory/tests/factory-tool-registry.test.ts | Updates registry expectations now that only script tools are registered. |
| packages/software-factory/tests/factory-tool-executor.test.ts | Refactors tests to cover script-tool executor behavior and safety wrapping for adopted boxel-cli tools. |
| packages/software-factory/tests/factory-tool-executor.spec.ts | Updates Playwright tests to execute realm_* tools via adopted boxel-cli tool definitions rather than executor dispatch. |
| packages/software-factory/tests/factory-tool-executor.integration.test.ts | Updates integration tests to validate boxel-cli tool wire shape via getToolDefinitions. |
| packages/software-factory/tests/factory-tool-builder.test.ts | Updates builder tests to expect realm tools from boxel-cli tool definitions and new tool names. |
| packages/software-factory/src/harness/database.ts | Comment update for renamed realm_search tool. |
| packages/software-factory/src/factory-tool-registry.ts | Removes boxel-cli/realm-api manifests; registry now only owns script tool manifests. |
| packages/software-factory/src/factory-tool-executor.ts | Extracts realm safety guard into exported enforceRealmSafety + adds buildRealmSafetyConfig; executor dispatch narrowed to script tools. |
| packages/software-factory/src/factory-tool-builder.ts | Adopts all boxel-cli tool definitions into agent tool list via adaptBoxelTool + safety wrapping; removes factory-local realm/search/run-command wrappers. |
| packages/software-factory/src/factory-issue-loop-wiring.ts | Updates wiring to use script-only ToolRegistry and buildRealmSafetyConfig. |
| packages/software-factory/src/factory-agent/types.ts | Narrows ToolManifest.category to 'script'. |
| packages/software-factory/scripts/smoke-tests/factory-tools-smoke.ts | Partially renames tools, but still uses executor/registry assumptions that appear outdated. |
| packages/software-factory/prompts/ticket-iterate.md | Updates guidance to realm_search with explicit realm-url. |
| packages/software-factory/prompts/ticket-implement.md | Updates guidance to realm_search with explicit realm-url. |
| packages/software-factory/prompts/system.md | Updates system instructions to realm_search with explicit realm-url. |
| packages/software-factory/prompts/bootstrap-implement.md | Updates bootstrap instructions to realm_search with explicit realm-url. |
| packages/software-factory/package.json | Removes orphaned volta.extends config. |
| packages/software-factory/.agents/skills/software-factory-operations/SKILL.md | Updates tool names but has a parameter-signature mismatch for realm_search. |
| packages/boxel-cli/tests/lib/tool-definitions.test.ts | Adds unit tests for getToolDefinitions (14 tools) and option mapping. |
| packages/boxel-cli/src/lib/tool-definitions.ts | Adds published boxel-cli tool definitions and requireStringArg helper. |
| packages/boxel-cli/src/lib/boxel-cli-client.ts | Adds push() wrapper and delegates runCommand() to standalone command implementation. |
| packages/boxel-cli/src/commands/run-command.ts | Makes runCommand() return structured errors (vs throw) and supports overriding realm server URL. |
| packages/boxel-cli/src/commands/realm/push.ts | Factors push into reusable push() returning PushResult and keeps CLI wrapper behavior. |
| packages/boxel-cli/api.ts | Exports getToolDefinitions, requireStringArg, and related types from the public API. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
i would have expected buildReadFileTool and buildWriteFileTool also to be removed. i think the boxel-cli has taken over this completly and hence needs to define the tool now
There was a problem hiding this comment.
There are two types of write_file and read_file tools. The write_file and read_file tools operate on the local filesystem, while realm_write_file and realm_read_file handle reading and writing files to a realm via an endpoint. The buildWriteFileTool and buildReadFileTool that remain here are for local filesystem operations, while the realm-based implementations have been migrated to the Boxel CLI.
There was a problem hiding this comment.
we dont need a tool tho to tell teh agent how to read and write a file on the local file system--every agent should know how to do that. the best thing we can do here is to get out of teh agent's way. it has a vast experience in doing this from its training data.
There was a problem hiding this comment.
i see we have file read and file write commands too in teh boxel-cli--that is unnecessary, the agent should know how to read and write a file from the local file system. you definitely see this when you use claude code--it's all the cat command that it's issuing as well as just piping into file descriptors that it does. there is no need to tell the agent how to do this, especially forcing it to use node. what it does is actually more efficient than any tool we can give it.
There was a problem hiding this comment.
Agree with you that we don't need read_file and write_file as tools, the agent handles local file ops better with its native tools, so I removed them from factory-tool-builder. But the file read and write in boxel-cli are still needed since they're not for local files, they hit the realm server over HTTP with the Matrix JWT to read/write files in non-target realms (scratch, source, catalog) that aren't synced locally, and the agent can't do that natively without realm credentials. The TARGET_REALM_BYPASS_TOOLS guard already blocks them from the target realm so the agent only uses them for non-target work. Happy to rename if realm_*_file feels confusing.
There was a problem hiding this comment.
dont we have a realm file read and realm file write for that purpose? we should be consistent with the name, the naming consistency matters to the agent
There was a problem hiding this comment.
we dont need a tool tho to tell teh agent how to read and write a file on the local file system--every agent should know how to do that.
But what about OpenRouter? I think this agent has no way to touch the filesystem without providing it tools
There was a problem hiding this comment.
really? i would think that all the agents used via open router have basic file system manipulation knowledge
- Remove workspace read_file/write_file factory tools (Hassan); agent uses its native filesystem tools on the workspace dir instead. - Fix requireStringArg to return raw.trim() (Copilot). - Tighten validateRealmTarget: parse with URL, require origin equality, then path-prefix-match within that origin — blocks subdomain-suffix SSRF like realms.example.test.evil.com (Copilot). - Rewire factory-tools-smoke.ts to invoke realm tools via getToolDefinitions + adaptBoxelTool instead of ToolExecutor.execute, matching the runtime path; registry asserts script-only (Codex, Copilot). - Update prompts and skills: realm_search now documented with required realm-url; target-realm I/O guidance points at native filesystem tools (Copilot). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
boxel-cli is now the single source of truth for the realm-operation tools
the factory needs. The factory imports `getToolDefinitions(client, config)`
and spreads the result into its `FactoryTool[]` via `adaptBoxelTool`,
which applies `enforceRealmSafety` at the seam.
What boxel-cli exports
- `getToolDefinitions(client, config)` returns 14 tool definitions backed
by `BoxelCLIClient` methods.
- Tool names use `verb_noun` snake_case throughout, with a `realm_` prefix
on operations against an existing realm:
realm_read_file, realm_write_file, realm_delete_file,
realm_search, realm_sync, realm_push, realm_pull,
realm_list_files, realm_lint_file,
realm_wait_for_ready, realm_cancel_indexing
Plus `create_realm` (verb leads — the realm doesn't exist yet) and
`read_transpiled` / `run_command` (no target-workspace counterpart, so
no prefix needed).
- `client.push()` added to BoxelCLIClient; `commands/realm/push.ts`
refactored to expose a clean `push()` lib function alongside the
commander wrapper, mirroring how `pull()` and `sync()` are factored.
Factory cleanup
- Retired `BOXEL_CLI_TOOLS`, `BOXEL_CLI_COMMAND_MAP`, `executeBoxelCli()`,
and `buildBoxelCliArgs()` from the factory. `ToolRegistry` now owns
only `SCRIPT_TOOLS`; the manifest category union narrowed to `'script'`.
- All prompts, the `software-factory-operations` skill, test files,
smoke scripts, and inline comments updated to the new tool names.
- `packages/software-factory/package.json`: removed orphaned
`volta.extends` block (root has no Volta config) so `pnpm test` runs
without a Volta error. Added `test:playwright:shard` script.
Realm safety wiring
- `RealmSafetyConfig.sourceRealmUrl` + `allowedRealmPrefixes` were
declared but never populated in production wiring. Added
`buildRealmSafetyConfig({ targetRealmUrl, realmServerUrl })` next to
`enforceRealmSafety` that derives both via the existing
`sourceRealmURLFor()` helper. Wired into `factory-issue-loop-wiring.ts`
(for `ToolExecutor`) and `factory-tool-builder.ts` (for adapted
boxel-cli tools). Source-realm rejection is now active in production;
agent can target any realm hosted on the same realm server modulo the
source-realm rejection that fires first.
- Dropped `sourceRealmUrl?` / `allowedRealmPrefixes?` from
`ToolBuilderConfig` — they're derivable, not config inputs. Tests +
smoke continue constructing `RealmSafetyConfig` directly with explicit
values for scenario coverage.
- `TARGET_REALM_BYPASS_TOOLS` updated to the new file-op names.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove workspace read_file/write_file factory tools (Hassan); agent uses its native filesystem tools on the workspace dir instead. - Fix requireStringArg to return raw.trim() (Copilot). - Tighten validateRealmTarget: parse with URL, require origin equality, then path-prefix-match within that origin — blocks subdomain-suffix SSRF like realms.example.test.evil.com (Copilot). - Rewire factory-tools-smoke.ts to invoke realm tools via getToolDefinitions + adaptBoxelTool instead of ToolExecutor.execute, matching the runtime path; registry asserts script-only (Codex, Copilot). - Update prompts and skills: realm_search now documented with required realm-url; target-realm I/O guidance points at native filesystem tools (Copilot). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3da54c5 to
f104657
Compare
Earlier review-feedback commit framed the guidance as "the agent does not invoke boxel sync / boxel push / boxel pull" — too broad. The orchestration loop only owns target-realm sync; the agent has its own realm tools available for any other realm operation. Reframe as enabling instead of prohibitive: explain the workspace shortcut for target-realm files and trust the published tool descriptions for everything else (no enumeration of tool names in prose). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This overlaps significantly with my work here #4578, I think we need to discuss this together with @habdelra. My assumption was that we are retiring many of the tools listed here and instead make the agent use its native commands + boxel CLI directly. That's where my PR went: the file-I/O wrappers (
This PR moves the tool definitions into boxel-cli for the factory to consume, keeping the dispatch alive, just relocating it. So as a general direction I am wondering:
|
If factory can call boxel CLI directly, I don't think we need to wrap boxel cli commands in tools definition |
|
Actually, I just realized that if we want continue supporting OpenRouter (I think that would be appropriate), then we have to maintain tool definitions, even for reading and writing a file, because OpenRouter doesn't ship any built-in tools, every model reached through it sees only what tools we send and only knows how to dispatch to JS functions we've written. There's no SDK doing it under the hood like there is with Claude agent. So I think landing this PR first makes sense, and then in my PR, I can make sure claude uses native tools (and not FactoryTool tools). This is the minimum list of tools openrouter needs (I verified this with a factory test run): |
so open router cannot manipulate a file system on its own? that seems odd. maybe worth a office hours discussion? but yes, I agree @jurgenwerk definitely we should be retriring wrappers and leaning on skills. the less tools we have the better IMHO. |
Yes, Claude keeps telling me about that inability, chat too: https://chatgpt.com/share/e/69f35079-30d4-800b-9c41-e3030beecd9f I also tried to run the factory with openrouter agent, with read/write file commented out in factory-tool-builder.ts, and it got stuck for a while but interestingly enough, it eventually figured it out how to do it using commands, |
thats fascinating. i guess the bigger concern i have is how important is open router to the software factory. |
Summary
boxel-cli is now the single source of truth for the realm-operation tools the factory consumes. The factory calls
getToolDefinitions(client, config)and spreads the result into itsFactoryTool[]viaadaptBoxelTool, which appliesenforceRealmSafetyat the seam. The factory's parallel subprocess dispatch (BOXEL_CLI_TOOLS+executeBoxelCli) is retired.What boxel-cli exports
getToolDefinitions(client, config)returns 14 tool definitions backed byBoxelCLIClientmethods. Each hasname,description, JSON-Schemaparameters, and a pre-boundexecutefunction — consumers don't need to know about realm URLs or auth.realm_read_fileboxel file readclient.readrealm_write_fileboxel file writeclient.writerealm_delete_fileboxel file deleteclient.deleterealm_list_filesboxel file listclient.listFilesrealm_lint_fileboxel file lintclient.lintrealm_searchboxel searchclient.searchrealm_syncboxel realm syncclient.syncrealm_pushboxel realm pushclient.pushrealm_pullboxel realm pullclient.pullrealm_wait_for_readyboxel realm wait-for-readyclient.waitForReadyrealm_cancel_indexingboxel realm cancel-indexingclient.cancelAllIndexingJobscreate_realmboxel realm createclient.createRealmread_transpiledboxel read-transpiledclient.readTranspiledrun_commandboxel run-commandclient.runCommandNaming
verb_nounsnake_case throughout, with arealm_prefix on operations against an existing realm.create_realmputs the verb first because the realm doesn't yet exist to scope against. Tools without a target-workspace counterpart (read_transpiled,run_command) don't need the prefix.The
realm_prefix on the file-op tools is load-bearing — it disambiguates from the factory-nativeread_file/write_file(target-workspace, norealm-urlparameter), lettingrealm_read_fileread as "the realm-scoped version ofread_file".What changed in the factory
BOXEL_CLI_TOOLS,BOXEL_CLI_COMMAND_MAP,executeBoxelCli(),buildBoxelCliArgs().ToolRegistrynow owns onlySCRIPT_TOOLS; the manifestcategoryunion narrowed to'script'.TARGET_REALM_BYPASS_TOOLSupdated torealm_read_file/realm_write_file/realm_delete_file; safety guard logic unchanged.system.md,ticket-implement.md,ticket-iterate.md,bootstrap-implement.md), thesoftware-factory-operationsskill, test files, smoke scripts, and inline comments updated to the new tool names.Plumbing
client.push()added toBoxelCLIClient;commands/realm/push.tsnow exports a cleanpush()lib function alongside the commander wrapper, mirroring howpull()andsync()are factored.BoxelCLIClient.runCommanddelegates to the standalonerunCommand()incommands/run-command.ts(matchingclient.lint,client.listFiles, etc.).requireStringArg()helper moved from factory to boxel-cli as a shared utility (still publicly exported).packages/software-factory/package.json: removed orphanedvolta.extendsblock (root has no Volta config) sopnpm testruns without a Volta error.Test plan
pnpm --filter @cardstack/boxel-cli test— 169/169 unit tests passing, including 24 intool-definitions.test.ts(covers all 14 tools + sync/push/pull builders). Integration tests skipped (require Synapse).pnpm --filter @cardstack/boxel-cli lint:types— clean.pnpm --filter @cardstack/software-factory test— 449/450 passing. The one failure isport-allocator > IPv4 holder blocks dual-stack bind, an OS-level dual-stack networking flake unrelated to this work.realm-read,realm-write,realm-delete,realm-search,realm-create,fetch_transpiled_module,list_files,lint_file,wait_for_ready,cancel_indexing) or retired symbols (BOXEL_CLI_TOOLS,executeBoxelCli,BOXEL_CLI_COMMAND_MAP,buildBoxelCliArgs) anywhere in source/tests/prompts/skills.Linear
Closes CS-10670
🤖 Generated with Claude Code