feat(mcp): forward file_pattern in semantic_search to scope results#1149
feat(mcp): forward file_pattern in semantic_search to scope results#1149carlos-alm wants to merge 7 commits into
Conversation
The MCP semantic_search wrapper silently dropped any file scoping argument because file_pattern was not declared on its args interface, even though the underlying search core has supported it all along (the CLI exposes it via --file). In monorepos with multiple large subtrees, this made MCP semantic_search effectively unusable: top-K was dominated by the larger, less-relevant tree with no signal to the caller that the filter was ignored. Add file_pattern (string | string[]) to the handler args, forward it as filePattern into searchOpts for hybrid/semantic/keyword modes, and declare it in the tool input schema. Unit tests assert the arg reaches each search backend. Closes #1143
|
Claude finished @carlos-alm's task —— View job Root Cause AnalysisProblem Being Solved: MCP semantic search lacked file pattern filtering, causing queries in monorepos to return unscoped results from the entire codebase instead of targeted subdirectories. This made the tool "effectively unusable" for large codebases where users needed focused search results. Why This Approach: The PR adds Risk Assessment: Very low risk. This is a pure additive change that forwards an existing parameter through the MCP interface without modifying core search logic. Backlog Compliance
Review FindingsPhase 1: Root Cause Analysis ✅The PR addresses a legitimate usability gap. MCP clients passing Phase 2: Configuration Changes ✅No configuration changes detected. All modifications are to source code and tests. Phase 3: Test Integrity ✅New test in
Existing tests continue to pass, confirming no regressions. Phase 4: Code Quality ✅Schema Declaration ( file_pattern: {
oneOf: [{ type: 'string' }, { type: 'array', items: { type: 'string' } }],
description: 'Restrict results to files matching one or more substring patterns (e.g. "db/" or ["db/", "src/"])',
},Well-designed schema using Parameter Forwarding ( filePattern: args.file_pattern,Clean parameter mapping from MCP convention ( Backend Implementation: Reviewed existing search implementations:
All backends already supported Minor Observations
Final Recommendation
This PR successfully brings MCP-CLI parity for file pattern filtering and will significantly improve the usability of semantic search in monorepo environments. |
Greptile SummaryThis PR adds
Confidence Score: 5/5Safe to merge — the change is additive and backwards-compatible, touching only the MCP tool interface and its schema declaration. The No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant C as MCP Client
participant R as tool-registry.ts
participant H as semantic-search.ts handler
participant S as search/index.js
C->>R: "semantic_search {query, file_pattern, mode, limit}"
R->>H: dispatch(args, ctx)
H->>H: "build searchOpts {limit, offset, minScore, filePattern}"
alt "mode = keyword"
H->>S: ftsSearchData(query, dbPath, searchOpts)
else "mode = semantic"
H->>S: searchData(query, dbPath, searchOpts)
else "mode = hybrid (default)"
H->>S: hybridSearchData(query, dbPath, searchOpts)
S-->>H: null (no FTS5)
H->>S: searchData(query, dbPath, searchOpts)
end
S-->>H: "{results}"
H-->>C: "{results} scoped to file_pattern"
Reviews (8): Last reviewed commit: "Merge branch 'main' into feat/mcp-semant..." | Re-trigger Greptile |
| description: | ||
| 'Restrict results to files matching one or more substring patterns (e.g. "db/" or ["db/", "src/"])', |
There was a problem hiding this comment.
The schema description says "substring patterns" but the underlying backends (
prepare.ts, filters.ts, keyword.ts) also support glob syntax (*, **, ?, […]). A caller reading only the schema would not know to try "src/**/*.ts" — and those glob patterns do work end-to-end through applyFilters/globMatch. Mentioning glob support here keeps the schema accurate and prevents confusion.
| description: | |
| 'Restrict results to files matching one or more substring patterns (e.g. "db/" or ["db/", "src/"])', | |
| description: | |
| 'Restrict results to files matching one or more glob or substring patterns (e.g. "db/", "src/**/*.ts", or ["db/", "src/"])', |
There was a problem hiding this comment.
Good catch — applied the suggestion verbatim in c35a237. Confirmed by reading src/domain/search/search/filters.ts that applyFilters does branch on /[*?[\]]/.test(p) and routes glob patterns through globMatch (which handles *, **, ?, and char classes), so the schema description now accurately reflects what the backends support.
Codegraph Impact Analysis2 functions changed → 0 callers affected across 0 files
|
Summary
semantic_searchnow acceptsfile_pattern(string or string[]) and forwards it asfilePatterninto the search core for hybrid, semantic, and keyword modesfile_patternin the tool input schema so MCP clients can discover itcodegraph search --file <pattern>(repeatable)Closes #1143
Why
Previously the args interface in
src/mcp/tools/semantic-search.tslisted only{query, mode, limit, offset, min_score}. MCP silently ignores unknown args, so a caller passing{"query": "...", "file_pattern": ["db/"]}got unscoped global hits back with no error. In monorepos this made the tool effectively unusable from MCP — the larger subtree dominated top-K and the caller had no signal that the filter was dropped.Test plan
npx vitest run tests/unit/mcp.test.ts— 41/41 pass (includes new dispatch test assertingfile_patternreaches each backend asfilePattern)npx vitest run tests/unit/mcp.test.ts tests/search/— 116/116 passnpx tsc --noEmit— cleannpm run lint— clean for changed files