perf(query): reuse Memgraph connection in semantic search tools (#505)#545
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the semantic search and function source tools to accept an injected ingestor instance rather than instantiating MemgraphIngestor internally. This change decouples the tools from configuration settings and simplifies database query execution by utilizing fetch_all instead of _execute_query. Corresponding updates were made to the MCP tools, main initialization, and test suites. Feedback on the changes suggests using QueryProtocol instead of the concrete MemgraphIngestor class for type annotations to resolve type mismatches with static analysis tools, and using .get() with fallbacks instead of direct dictionary access when parsing query results to prevent potential KeyError exceptions.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| result = results_map[node_id] | ||
| result_type = result["type"] | ||
| type_str = ( | ||
| result_type[0] | ||
| if isinstance(result_type, list) and result_type | ||
| else cs.SEMANTIC_TYPE_UNKNOWN | ||
| ) | ||
| formatted_results.append( | ||
| SemanticSearchResult( | ||
| node_id=node_id, | ||
| qualified_name=str(result["qualified_name"]), | ||
| name=str(result["name"]), | ||
| type=type_str, | ||
| score=round(score, 3), | ||
| ) | ||
| ) |
There was a problem hiding this comment.
To prevent potential KeyError exceptions at runtime, it is safer to use .get() when retrieving fields from the query results. This is especially important for dynamic graph databases where some properties (like type, qualified_name, or name) might occasionally be missing or malformed on certain nodes.
Using .get() with appropriate fallbacks also aligns with how other fields (like path, start_line, and end_line) are defensively retrieved in get_function_source_code later in this file.
| result = results_map[node_id] | |
| result_type = result["type"] | |
| type_str = ( | |
| result_type[0] | |
| if isinstance(result_type, list) and result_type | |
| else cs.SEMANTIC_TYPE_UNKNOWN | |
| ) | |
| formatted_results.append( | |
| SemanticSearchResult( | |
| node_id=node_id, | |
| qualified_name=str(result["qualified_name"]), | |
| name=str(result["name"]), | |
| type=type_str, | |
| score=round(score, 3), | |
| ) | |
| ) | |
| result = results_map[node_id] | |
| result_type = result.get("type") | |
| type_str = ( | |
| result_type[0] | |
| if isinstance(result_type, list) and result_type | |
| else cs.SEMANTIC_TYPE_UNKNOWN | |
| ) | |
| formatted_results.append( | |
| SemanticSearchResult( | |
| node_id=node_id, | |
| qualified_name=str(result.get("qualified_name", "")), | |
| name=str(result.get("name", "")), | |
| type=type_str, | |
| score=round(score, 3), | |
| ) | |
| ) |
Greptile SummaryThis PR reuses the existing Memgraph ingestor for semantic search tools. The main changes are:
Confidence Score: 5/5The change is narrowly scoped to dependency injection and public query API usage for semantic search tooling. No correctness issues were identified in the reviewed changes, and the updated tests cover the intended ingestor reuse behavior.
What T-Rex did
|
| ) | ||
|
|
||
| self._semantic_search_tool = create_semantic_search_tool() | ||
| self._semantic_search_tool = create_semantic_search_tool(self.ingestor) |
There was a problem hiding this comment.
Shared Connection Concurrent Reads
This captures the registry's long-lived ingestor for semantic search, while other MCP handlers use the same ingestor from worker threads. A semantic search request can now run fetch_all() on the event-loop thread at the same time as list_projects() or another handler runs on the same mgclient connection, and _get_cursor() only locks cursor creation, not query execution. Concurrent MCP requests can corrupt or fail graph reads because the shared connection is not serialized.
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/mcp/tools.py
Line: 96
Comment:
**Shared Connection Concurrent Reads**
This captures the registry's long-lived ingestor for semantic search, while other MCP handlers use the same ingestor from worker threads. A semantic search request can now run `fetch_all()` on the event-loop thread at the same time as `list_projects()` or another handler runs on the same mgclient connection, and `_get_cursor()` only locks cursor creation, not query execution. Concurrent MCP requests can corrupt or fail graph reads because the shared connection is not serialized.
How can I resolve this? If you propose a fix, please make it concise.| self._shell_command_tool, | ||
| self._directory_lister_tool, | ||
| create_get_function_source_tool(), | ||
| create_get_function_source_tool(self.ingestor), |
There was a problem hiding this comment.
The source-code tool captured by the long-lived agent now uses the same registry ingestor as the rest of the MCP server. When ask_agent invokes this tool while another MCP request is reading or updating the graph through self.ingestor, source lookup can execute on the same connection without the registry lock. That can make source retrieval fail intermittently or break the shared Memgraph session.
Artifacts
Repro: focused shared-ingestor concurrency harness
- Contains supporting evidence from the run (text/x-python; charset=utf-8).
Repro: execution log showing concurrent use of the shared ingestor
- Keeps the command output available without making the summary code-heavy.
Ran code and verified through T-Rex
Prompt To Fix With AI
This is a comment left during a code review.
Path: codebase_rag/mcp/tools.py
Line: 347
Comment:
**Agent Tool Shares Connection**
The source-code tool captured by the long-lived agent now uses the same registry ingestor as the rest of the MCP server. When `ask_agent` invokes this tool while another MCP request is reading or updating the graph through `self.ingestor`, source lookup can execute on the same connection without the registry lock. That can make source retrieval fail intermittently or break the shared Memgraph session.
How can I resolve this? If you propose a fix, please make it concise.…load Use QueryProtocol for ingestor typing to match main.py and other tools. Parse graph result fields with .get() fallbacks. Offload ingestor I/O via asyncio.to_thread in async tool wrappers, matching query_code_graph pattern.
vitali87
left a comment
There was a problem hiding this comment.
@ChetanyaRathi thank you for the PR. Please clear all greptile and gemini comments until the greptile Confidence Score becomes 5.
I will review after.
|
@greptileai please re-review. The previous pass ran against the first commit (
On the two "shared connection" P1s, I believe these are false positives: every query on the shared ingestor is serialized by |
Summary
MemgraphIngestorinsemantic_code_searchandget_function_source_codeinstead of opening a new Memgraph connection on every call.ingestor._execute_query()calls to the publicfetch_all()API.create_semantic_search_tool/create_get_function_source_toolfactories and their call sites inmain.pyandmcp/tools.py. Behavior is unchanged —fetch_all(query, params)is the public wrapper around_execute_query(query, params).Type of Change
Related Issues
Closes #505
Test Plan
test_semantic_code_search_reuses_injected_ingestor, which asserts the injected ingestor'sfetch_all()is called and_execute_query()is never called — a regression guard against re-introducing a per-call connection.test_semantic_search.pysuite to pass a mock ingestor directly rather than patchingMemgraphIngestor.uv run pytest -m "not integration"locally: all 4019 non-integration tests pass. The only failures are the Java-oracle tests (test_java_*_oracle.py), which shell out tojavacand fail solely because a JDK isn't installed in my local environment — unrelated to this change.