The spec today assumes tools are called from a top-level browsing context with the user present in the tab. The explainer lists headless / no-human-present execution as a non-goal, and proposal.md flags it as a future consideration ("There is currently no support for agents or assistive tools to call tools 'headlessly' without visible browser UI").
In practice the ecosystem is already pulling in the other direction:
- webmcp-proxy (Alpic) lets a remote MCP server's tools show up as if the page itself had registered them, so an off-page agent can invoke them through
navigator.modelContext.
- Cloudflare's Browser Run lets an off-device agent drive a headless Chrome session and call the WebMCP tools registered by whatever page it lands on.
The problem
There are really two questions tangled together here:
-
Should WebMCP tools be reachable from off-page callers at all? Today the spec is silent on this, and whatever bridge happens to be in the loop ends up making the call by default.
-
If yes, how does a site set different policies for different tools? Not every tool wants the same exposure:
- A
search_products tool: fine to expose broadly. The site benefits from a remote agent discovering inventory.
- A
checkout or add_payment_method tool: should only run when the user is actually in the tab and can see what's happening. Letting this be invoked from a background proxy or a remote agent is not something most merchants would knowingly opt into.
Right now there is no way for the site to say "this tool, yes; that tool, only when the user is here." Once a tool is registered, anything with access to navigator.modelContext in that context can call it. The existing requestUserInteraction() primitive lets a tool prompt for confirmation, but it doesn't let the site refuse calls based on where the caller is.
Proposal
Concretely, I think this wants to live as a hint on the tool itself, in the same family as MCP's existing readOnlyHint / destructiveHint / idempotentHint / openWorldHint annotations.
Rough sketch (illustrative, not a proposed API):
// In-page only — agent must be invoking from the tab the user is in.
navigator.modelContext.registerTool({
name: "checkout",
description: "Complete the order using the current cart.",
inputSchema: { /* ... */ },
annotations: {
destructiveHint: true,
executionSurfaces: ["in-page"], // <-- new
},
async execute(params) { /* ... */ },
});
// Safe to expose to remote callers and background bridges.
navigator.modelContext.registerTool({
name: "search_products",
description: "Search the catalog.",
inputSchema: { /* ... */ },
annotations: {
readOnlyHint: true,
executionSurfaces: ["in-page", "background", "remote"], // <-- new
},
async execute(params) { /* ... */ },
});
The shape I have in mind:
executionSurfaces is a set of allowed call sites, not a single enum, so a tool can opt into more than one without losing the ability to refuse others.
- Candidate values worth discussing:
in-page (top-level browsing context, user-visible tab), background (same-origin worker / launch-handler-style headless invocation), remote (proxied to an off-device MCP client).
- Default is
["in-page"] if omitted, so opening a tool to remote or background callers is an explicit act by the site.
- The browser, and any bridge sitting between
navigator.modelContext and a non-in-page client (e.g. webmcp-proxy), is expected to filter tools/list and reject tools/call for surfaces a tool hasn't opted into.
Like the existing hints, this would be advisory rather than a security boundary — a misbehaving extension in the same tab can still call whatever it can see. The value is that well-behaved bridges and browser-mediated agents have an unambiguous declaration of intent to honor, and sites have a clean basis to point at when one doesn't.
The spec today assumes tools are called from a top-level browsing context with the user present in the tab. The explainer lists headless / no-human-present execution as a non-goal, and proposal.md flags it as a future consideration ("There is currently no support for agents or assistive tools to call tools 'headlessly' without visible browser UI").
In practice the ecosystem is already pulling in the other direction:
navigator.modelContext.The problem
There are really two questions tangled together here:
Should WebMCP tools be reachable from off-page callers at all? Today the spec is silent on this, and whatever bridge happens to be in the loop ends up making the call by default.
If yes, how does a site set different policies for different tools? Not every tool wants the same exposure:
search_productstool: fine to expose broadly. The site benefits from a remote agent discovering inventory.checkoutoradd_payment_methodtool: should only run when the user is actually in the tab and can see what's happening. Letting this be invoked from a background proxy or a remote agent is not something most merchants would knowingly opt into.Right now there is no way for the site to say "this tool, yes; that tool, only when the user is here." Once a tool is registered, anything with access to
navigator.modelContextin that context can call it. The existingrequestUserInteraction()primitive lets a tool prompt for confirmation, but it doesn't let the site refuse calls based on where the caller is.Proposal
Concretely, I think this wants to live as a hint on the tool itself, in the same family as MCP's existing
readOnlyHint/destructiveHint/idempotentHint/openWorldHintannotations.Rough sketch (illustrative, not a proposed API):
The shape I have in mind:
executionSurfacesis a set of allowed call sites, not a single enum, so a tool can opt into more than one without losing the ability to refuse others.in-page(top-level browsing context, user-visible tab),background(same-origin worker / launch-handler-style headless invocation),remote(proxied to an off-device MCP client).["in-page"]if omitted, so opening a tool to remote or background callers is an explicit act by the site.navigator.modelContextand a non-in-page client (e.g. webmcp-proxy), is expected to filtertools/listand rejecttools/callfor surfaces a tool hasn't opted into.Like the existing hints, this would be advisory rather than a security boundary — a misbehaving extension in the same tab can still call whatever it can see. The value is that well-behaved bridges and browser-mediated agents have an unambiguous declaration of intent to honor, and sites have a clean basis to point at when one doesn't.