P fix list_directory_from_gh in case no directory is returned from API#76
P fix list_directory_from_gh in case no directory is returned from API#76
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens the gh_file_viewer MCP server’s directory-listing tool to handle GitHub Contents API responses where the requested path is not a directory, and adds a comprehensive unit-test suite for the module.
Changes:
- Update
list_directory_from_ghto handle empty results and non-list JSON payloads (e.g., file metadata) without crashing. - Add new
tests/test_gh_file_viewer.pycovering file fetch, line slicing, search (file/repo), directory listing, and helper utilities.
Show a summary per file
| File | Description |
|---|---|
src/seclab_taskflows/mcp_servers/gh_file_viewer.py |
Prevents failures when the Contents API returns a non-directory payload by validating the decoded JSON type before iterating. |
tests/test_gh_file_viewer.py |
Introduces new unit tests for the gh_file_viewer tools and helper functions, including the newly handled “path is a file” case. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 2
| class TestFetchFileFromGh: | ||
| @pytest.mark.asyncio | ||
| async def test_fetch_file_success(self): | ||
| resp = _make_response(text=SAMPLE_FILE_CONTENT) | ||
| with patch.object(gfv_mod, "call_api", new_callable=AsyncMock, return_value=resp): | ||
| result = await gfv_mod.fetch_file_from_gh.fn(owner="Owner", repo="Repo", path="src/main.py") | ||
| assert "1: import os" in result |
There was a problem hiding this comment.
This test module introduces many async def tests marked with @pytest.mark.asyncio. Unless the test environment installs and configures pytest-asyncio, pytest will error when collecting/running coroutine tests (especially in CI via hatch test). Please ensure pytest-asyncio is included in the test dependencies (e.g., Hatch test env deps / optional test extra) so these tests run on all CI platforms.
| resp = _make_response(text="") | ||
| with patch.object(gfv_mod, "call_api", new_callable=AsyncMock, return_value=resp): | ||
| result = await gfv_mod.search_files_from_gh.fn( | ||
| owner="owner", repo="repo", paths="", search_term="import", save_to_db=False, | ||
| ) | ||
| # empty string split yields [""], which hits the API for an empty path | ||
| assert "No matches found" in result |
There was a problem hiding this comment.
test_search_files_no_paths currently asserts the behavior where paths="" still triggers an API call and returns "No matches found". In search_files_from_gh, the guard if not paths_list: is effectively unreachable because "".split(",") yields [""]. Consider filtering out empty/whitespace-only paths and returning the intended "No paths provided for search." message, then update this test to match the corrected behavior.
| resp = _make_response(text="") | |
| with patch.object(gfv_mod, "call_api", new_callable=AsyncMock, return_value=resp): | |
| result = await gfv_mod.search_files_from_gh.fn( | |
| owner="owner", repo="repo", paths="", search_term="import", save_to_db=False, | |
| ) | |
| # empty string split yields [""], which hits the API for an empty path | |
| assert "No matches found" in result | |
| with patch.object(gfv_mod, "call_api", new_callable=AsyncMock) as mock_call_api: | |
| result = await gfv_mod.search_files_from_gh.fn( | |
| owner="owner", repo="repo", paths="", search_term="import", save_to_db=False, | |
| ) | |
| assert result == "No paths provided for search." | |
| mock_call_api.assert_not_awaited() |
There was a problem hiding this comment.
Pull request overview
This PR hardens the gh_file_viewer MCP server’s GitHub “contents” directory listing behavior when the GitHub API returns an empty payload or a non-directory payload (e.g., a single file object), and adds a dedicated test suite to validate the module’s tools end-to-end via mocked API calls.
Changes:
- Fix
list_directory_from_ghto safely handle empty API JSON responses and “path is a file” responses (dict vs list). - Add a comprehensive new
tests/test_gh_file_viewer.pycovering file fetch, line slicing, searching, directory listing, repo zip searching, and helper utilities. - Make table creation explicitly
checkfirst=Truefor theSearchResultstable initialization.
Show a summary per file
| File | Description |
|---|---|
src/seclab_taskflows/mcp_servers/gh_file_viewer.py |
Handles empty/non-list JSON from the GitHub contents API to avoid runtime errors when listing directories. |
tests/test_gh_file_viewer.py |
Adds coverage for gh_file_viewer tool functions, including the newly fixed directory listing edge cases. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 0 new
There was a problem hiding this comment.
Pull request overview
This PR improves the GitHub file viewer MCP server’s robustness when listing repository “directories” via the GitHub Contents API, and adds a dedicated test suite for the module.
Changes:
- Update
list_directory_from_ghto handle empty API responses and non-directory JSON payloads. - Add comprehensive unit tests for
gh_file_viewertool functions (file fetch, line ranges, search, directory listing, repo zip search). - Register the
xdist_grouppytest marker inpyproject.tomlto support grouping tests onto one xdist worker.
Show a summary per file
| File | Description |
|---|---|
| tests/test_gh_file_viewer.py | Adds a full unit test module for gh_file_viewer, including directory-list edge cases. |
| src/seclab_taskflows/mcp_servers/gh_file_viewer.py | Hardens directory listing behavior and adds import-time DB init error handling. |
| pyproject.toml | Adds pytest marker configuration for xdist_group. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 2
| try: | ||
| Base.metadata.create_all(engine, tables=[SearchResults.__table__]) | ||
| except Exception as e: | ||
| logging.exception(f"Database already exists") # only log here, as this error likely only happens in test |
There was a problem hiding this comment.
The module-level try/except around Base.metadata.create_all(...) catches all exceptions and then continues, which can hide real database setup failures (e.g., missing/invalid DB path, permission issues) and lead to later runtime errors when writing/querying. If this is meant to tolerate a known concurrent-create race, catch the specific SQLAlchemy exception(s) and only ignore the specific "already exists" case; otherwise re-raise after logging. Also, the log message "Database already exists" is misleading for most exceptions coming from create_all.
| resp = _make_response(text="") | ||
| with patch.object(gfv_mod, "call_api", new_callable=AsyncMock, return_value=resp): | ||
| result = await gfv_mod.search_files_from_gh.fn( | ||
| owner="owner", repo="repo", paths="", search_term="import", save_to_db=False, | ||
| ) | ||
| # empty string split yields [""], which hits the API for an empty path | ||
| assert "No matches found" in result |
There was a problem hiding this comment.
This test case documents that paths="" ends up calling the GitHub API with an empty path (because "".split(",") produces [""]). That behavior is almost certainly unintended (the implementation even has a "No paths provided" branch that never triggers for empty input), and encoding it in a test will make it harder to fix later. Consider filtering out empty/whitespace-only paths (and asserting the intended "No paths provided"/similar message here) instead of validating the current accidental behavior.
| resp = _make_response(text="") | |
| with patch.object(gfv_mod, "call_api", new_callable=AsyncMock, return_value=resp): | |
| result = await gfv_mod.search_files_from_gh.fn( | |
| owner="owner", repo="repo", paths="", search_term="import", save_to_db=False, | |
| ) | |
| # empty string split yields [""], which hits the API for an empty path | |
| assert "No matches found" in result | |
| with patch.object(gfv_mod, "call_api", new_callable=AsyncMock) as mock_call_api: | |
| result = await gfv_mod.search_files_from_gh.fn( | |
| owner="owner", repo="repo", paths="", search_term="import", save_to_db=False, | |
| ) | |
| assert "No paths provided" in result | |
| mock_call_api.assert_not_awaited() |
There was a problem hiding this comment.
Pull request overview
This PR improves robustness of the gh_file_viewer MCP server by handling GitHub “contents” API responses that are not directory listings, and adds a comprehensive test suite for the module.
Changes:
- Add
list_directory_from_ghhandling for empty/non-list API JSON payloads (e.g., when the path is a file). - Add new
tests/test_gh_file_viewer.pycovering GitHub file viewer behaviors (fetch/search/list/search repo). - Update test environment config in
pyproject.toml(pytest marker registration + extra test dependency).
Show a summary per file
| File | Description |
|---|---|
src/seclab_taskflows/mcp_servers/gh_file_viewer.py |
Adds safer handling for directory listing API payloads and wraps DB table creation in an exception handler. |
tests/test_gh_file_viewer.py |
Introduces async and sync tests for the GitHub file viewer MCP tools and helpers. |
pyproject.toml |
Adds hatch test env extra dependency and registers custom pytest marker used by xdist grouping. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 1
| try: | ||
| Base.metadata.create_all(engine, tables=[SearchResults.__table__]) | ||
| except sqlalchemy.exc.OperationalError as e: | ||
| logging.exception(f"Database/Tables already exist(s)") # only log here, as this error likely only happens in test |
There was a problem hiding this comment.
Catching all sqlalchemy.exc.OperationalError from Base.metadata.create_all(...) and then continuing can mask real DB init failures (e.g., permission errors, missing/readonly directory, DB locked/corrupt). Consider only suppressing the specific “table already exists” race case (inspect the exception message/orig), and re-raise otherwise (and/or ensure SEARCH_RESULT_DIR exists before creating the engine). Also logging.exception here is very noisy for an expected condition; prefer a warning/info log that includes the actual exception details.
See below for a potential fix:
os.makedirs(SEARCH_RESULT_DIR, exist_ok=True)
def _is_table_already_exists_error(error: sqlalchemy.exc.OperationalError) -> bool:
error_messages = [str(error).lower()]
if getattr(error, "orig", None) is not None:
error_messages.append(str(error.orig).lower())
return any(
"already exists" in message and "table" in message
for message in error_messages
)
engine = create_engine(f"sqlite:///{os.path.abspath(SEARCH_RESULT_DIR)}/search_result.db", echo=False)
try:
Base.metadata.create_all(engine, tables=[SearchResults.__table__])
except sqlalchemy.exc.OperationalError as e:
if _is_table_already_exists_error(e):
logging.warning("Search results table already exists during initialization: %s", e)
else:
raise
No description provided.