perf(query): reuse Memgraph connection in semantic search tools (#505) by ChetanyaRathi · Pull Request #545 · vitali87/code-graph-rag

ChetanyaRathi · 2026-06-28T21:57:31Z

Summary

Reuse the orchestrator's existing MemgraphIngestor in semantic_code_search and get_function_source_code instead of opening a new Memgraph connection on every call.
Switch the private ingestor._execute_query() calls to the public fetch_all() API.
Thread the ingestor through the create_semantic_search_tool / create_get_function_source_tool factories and their call sites in main.py and mcp/tools.py. Behavior is unchanged — fetch_all(query, params) is the public wrapper around _execute_query(query, params).

Type of Change

Related Issues

Closes #505

Test Plan

Added test_semantic_code_search_reuses_injected_ingestor, which asserts the injected ingestor's fetch_all() is called and _execute_query() is never called — a regression guard against re-introducing a per-call connection.
Migrated the existing test_semantic_search.py suite to pass a mock ingestor directly rather than patching MemgraphIngestor.
Ran uv run pytest -m "not integration" locally: all 4019 non-integration tests pass. The only failures are the Java-oracle tests (test_java_*_oracle.py), which shell out to javac and fail solely because a JDK isn't installed in my local environment — unrelated to this change.

…li87#505)

gemini-code-assist

Code Review

This pull request refactors the semantic search and function source tools to accept an injected ingestor instance rather than instantiating MemgraphIngestor internally. This change decouples the tools from configuration settings and simplifies database query execution by utilizing fetch_all instead of _execute_query. Corresponding updates were made to the MCP tools, main initialization, and test suites. Feedback on the changes suggests using QueryProtocol instead of the concrete MemgraphIngestor class for type annotations to resolve type mismatches with static analysis tools, and using .get() with fallbacks instead of direct dictionary access when parsing query results to prevent potential KeyError exceptions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-28T21:59:42Z

+                result = results_map[node_id]
+                result_type = result["type"]
+                type_str = (
+                    result_type[0]
+                    if isinstance(result_type, list) and result_type
+                    else cs.SEMANTIC_TYPE_UNKNOWN
+                )
+                formatted_results.append(
+                    SemanticSearchResult(
+                        node_id=node_id,
+                        qualified_name=str(result["qualified_name"]),
+                        name=str(result["name"]),
+                        type=type_str,
+                        score=round(score, 3),
                    )
+                )


To prevent potential KeyError exceptions at runtime, it is safer to use .get() when retrieving fields from the query results. This is especially important for dynamic graph databases where some properties (like type, qualified_name, or name) might occasionally be missing or malformed on certain nodes.

Using .get() with appropriate fallbacks also aligns with how other fields (like path, start_line, and end_line) are defensively retrieved in get_function_source_code later in this file.

Suggested change

result = results_map[node_id]

result_type = result["type"]

type_str = (

result_type[0]

if isinstance(result_type, list) and result_type

else cs.SEMANTIC_TYPE_UNKNOWN

)

formatted_results.append(

SemanticSearchResult(

node_id=node_id,

qualified_name=str(result["qualified_name"]),

name=str(result["name"]),

type=type_str,

score=round(score, 3),

)

)

result = results_map[node_id]

result_type = result.get("type")

type_str = (

result_type[0]

if isinstance(result_type, list) and result_type

else cs.SEMANTIC_TYPE_UNKNOWN

)

formatted_results.append(

SemanticSearchResult(

node_id=node_id,

qualified_name=str(result.get("qualified_name", "")),

name=str(result.get("name", "")),

type=type_str,

score=round(score, 3),

)

)

greptile-apps · 2026-06-28T22:03:50Z

Greptile Summary

This PR reuses the existing Memgraph ingestor for semantic search tools. The main changes are:

Injects the orchestrator or MCP registry ingestor into semantic search tool factories.
Replaces private graph query calls with the public fetch_all() API.
Runs semantic search and source lookup work off the async event loop.
Updates semantic search tests to use injected mock ingestors.

Confidence Score: 5/5

The change is narrowly scoped to dependency injection and public query API usage for semantic search tooling.

No correctness issues were identified in the reviewed changes, and the updated tests cover the intended ingestor reuse behavior.

T-Rex Logs

What T-Rex did

Compared the base and head code paths for semantic_code_search and create_semantic_search_tool, and both runs returned the same two formatted results and the same tool response; the base path used MemgraphIngestor twice with two _execute_query calls and the head path used fetch_all twice.
Verified that both direct and tool paths produced the same expected source and exited with code 0, and that the logs captured the exact script content along with the command and working directory.
Observed the call-site wiring change: base code called create_semantic_search_tool() and create_get_function_source_tool() with no arguments, while head code called these factories with an ingestor argument (and similar self.ingestor variants), with the factory definitions still showing empty raw_args/args; runtime import was blocked by missing loguru, but the AST/regex probe confirmed the signature changes without starting external services.

_{Ran code and verified through T-Rex}

Comments Outside Diff (1)

General comment

Semantic search factory call sites pass ingestor to zero-argument factories
- Bug
  - At head, codebase_rag/main.py calls both semantic factory functions with ingestor, and codebase_rag/mcp/tools.py calls them with self.ingestor. However, codebase_rag/tools/semantic_search.py still defines create_semantic_search_tool() and create_get_function_source_tool() with no parameters. These changed call sites will raise TypeError: ... takes 0 positional arguments but 1 was given when semantic dependencies are available or orchestrator initialization reaches these tools.
- Cause
  - The call sites were updated to thread the existing ingestor instance, but the semantic tool factory signatures and implementations at the checked head commit were not updated to accept and use that ingestor.
- Fix
  - Update create_semantic_search_tool and create_get_function_source_tool to accept an ingestor/query protocol parameter and ensure their inner tool functions use that instance instead of any per-call/global connection path. Alternatively, revert the call-site argument change, but that would not satisfy the intended shared-ingestor contract.
_{Ran code and verified through T-Rex}

_{Reviews (2): Last reviewed commit: "Address PR review: QueryProtocol, defens..." | Re-trigger Greptile}

greptile-apps · 2026-06-28T22:03:54Z

            )

-            self._semantic_search_tool = create_semantic_search_tool()
+            self._semantic_search_tool = create_semantic_search_tool(self.ingestor)


Shared Connection Concurrent Reads

This captures the registry's long-lived ingestor for semantic search, while other MCP handlers use the same ingestor from worker threads. A semantic search request can now run fetch_all() on the event-loop thread at the same time as list_projects() or another handler runs on the same mgclient connection, and _get_cursor() only locks cursor creation, not query execution. Concurrent MCP requests can corrupt or fail graph reads because the shared connection is not serialized.

Prompt To Fix With AI

This is a comment left during a code review. Path: codebase_rag/mcp/tools.py Line: 96 Comment: **Shared Connection Concurrent Reads** This captures the registry's long-lived ingestor for semantic search, while other MCP handlers use the same ingestor from worker threads. A semantic search request can now run `fetch_all()` on the event-loop thread at the same time as `list_projects()` or another handler runs on the same mgclient connection, and `_get_cursor()` only locks cursor creation, not query execution. Concurrent MCP requests can corrupt or fail graph reads because the shared connection is not serialized. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-06-28T22:03:55Z

                self._shell_command_tool,
                self._directory_lister_tool,
-                create_get_function_source_tool(),
+                create_get_function_source_tool(self.ingestor),


Agent Tool Shares Connection

The source-code tool captured by the long-lived agent now uses the same registry ingestor as the rest of the MCP server. When ask_agent invokes this tool while another MCP request is reading or updating the graph through self.ingestor, source lookup can execute on the same connection without the registry lock. That can make source retrieval fail intermittently or break the shared Memgraph session.

Artifacts

Repro: focused shared-ingestor concurrency harness

Contains supporting evidence from the run (text/x-python; charset=utf-8).

Repro: execution log showing concurrent use of the shared ingestor

Keeps the command output available without making the summary code-heavy.

_{Ran code and verified through T-Rex}

Prompt To Fix With AI

This is a comment left during a code review. Path: codebase_rag/mcp/tools.py Line: 347 Comment: **Agent Tool Shares Connection** The source-code tool captured by the long-lived agent now uses the same registry ingestor as the rest of the MCP server. When `ask_agent` invokes this tool while another MCP request is reading or updating the graph through `self.ingestor`, source lookup can execute on the same connection without the registry lock. That can make source retrieval fail intermittently or break the shared Memgraph session. How can I resolve this? If you propose a fix, please make it concise.

…load Use QueryProtocol for ingestor typing to match main.py and other tools. Parse graph result fields with .get() fallbacks. Offload ingestor I/O via asyncio.to_thread in async tool wrappers, matching query_code_graph pattern.

vitali87

@ChetanyaRathi thank you for the PR. Please clear all greptile and gemini comments until the greptile Confidence Score becomes 5.
I will review after.

ChetanyaRathi · 2026-06-29T22:51:21Z

@greptileai please re-review. The previous pass ran against the first commit (c74e94e); the latest commit (f95bd55) addresses the open feedback:

Type hints: the ingestor params now use QueryProtocol in semantic_search.py and both tool factories, matching how main.py already annotates it (resolves the mypy/pyright mismatch).
Defensive parsing: the semantic-search result mapping uses .get() with fallbacks instead of direct key access.
Threading: the blocking fetch_all() calls in the semantic-search and source-lookup tools are offloaded via asyncio.to_thread, so they no longer run on the event loop.

On the two "shared connection" P1s, I believe these are false positives: every query on the shared ingestor is serialized by self._conn_lock in graph_service.py. The yield in _get_cursor() sits inside the with self._conn_lock: block, so the lock is held across execute() and the fetch, not just cursor creation. fetch_all() / execute_write() / list_projects() all route through it, so concurrent calls queue on the lock rather than colliding on the connection. Happy to walk through it inline if the re-run still flags them.

ChetanyaRathi · 2026-06-29T23:08:24Z

@vitali87 Greptile is now at 5/5 on the latest commit (f95bd55), and I've resolved the open review threads, also ready for your review whenever you get a chance. Thank you!

perf(query): reuse Memgraph connection in semantic search tools (vita…

c74e94e

…li87#505)

ChetanyaRathi requested a review from vitali87 as a code owner June 28, 2026 21:57

github-project-automation Bot added this to @vitali87's graph code Jun 28, 2026

gemini-code-assist Bot reviewed Jun 28, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 28, 2026

View reviewed changes

vitali87 requested changes Jun 29, 2026

View reviewed changes

vitali87 approved these changes Jun 29, 2026

View reviewed changes

vitali87 merged commit 32e50f5 into vitali87:main Jun 29, 2026
1 check passed

github-project-automation Bot moved this to Done in @vitali87's graph code Jun 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(query): reuse Memgraph connection in semantic search tools (#505)#545

perf(query): reuse Memgraph connection in semantic search tools (#505)#545
vitali87 merged 2 commits into
vitali87:mainfrom
ChetanyaRathi:perf/reuse-memgraph-conn-semantic-search

ChetanyaRathi commented Jun 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Uh oh!

greptile-apps Bot commented Jun 28, 2026 •

edited

Loading

T-Rex Logs

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 28, 2026

Uh oh!

greptile-apps Bot Jun 28, 2026

Uh oh!

vitali87 left a comment

Uh oh!

ChetanyaRathi commented Jun 29, 2026

Uh oh!

ChetanyaRathi commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ChetanyaRathi commented Jun 28, 2026

Summary

Type of Change

Related Issues

Test Plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

T-Rex Logs

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

vitali87 left a comment

Choose a reason for hiding this comment

Uh oh!

ChetanyaRathi commented Jun 29, 2026

Uh oh!

ChetanyaRathi commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 28, 2026 •

edited

Loading