Render Read tool results with pygments via structured payload (closes #170)#172
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR enhances Read tool parsing to handle structured ChangesRead tool structured parsing and rendering
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The Read tool's `tool_result.content` is `cat -n` formatted (each line prefixed with `<line_number><TAB>`), and the existing parser's regex only matched the arrow variant (`<num>→<content>`, used by Edit/Write result snippets). Read entries fell through to the generic ToolResultContent fallback — rendered as raw monospace text with no syntax highlighting, no lexer detection, no line-number alignment. Two-pronged fix in `parse_read_output`: 1. Preferred path: consume `toolUseResult.file` directly when present. That field carries byte-clean content (no `<num>\t` prefix) plus accurate `filePath`, `startLine`, `numLines`, `totalLines` metadata. Avoids the lossy cat-n round-trip and is the only path that knows `totalLines` (needed for the truncation flag). 2. Fallback path: extend `_parse_cat_n_snippet`'s regex from `\s+(\d+)→(.*)$` to `\s*(\d+)[\t→](.*)$` so it accepts both separator variants. Read on older transcripts (no `toolUseResult`) still parses correctly; Edit/Write arrow form is preserved. Register "Read" in `PARSERS_WITH_TOOL_USE_RESULT` so the factory passes the structured payload through. No changes to `format_read_output` / `render_file_content_collapsible` — the existing pygments machinery (`highlight_code_with_pygments` + `linenostart=output.start_line`) already does the right thing once `ReadOutput.content` is clean and `start_line` is correct. 12 new regression tests on `read_tool_pygments.jsonl` (built from the exact tool_use + tool_result pair in issue #170) cover both parser paths, lexer detection, line-number alignment, and edge cases (unknown extension, single-line, empty file, missing file_path).
beda168 to
2ae70c1
Compare
The ``int(file_info.get("numLines") or default)`` shortcut in the
structured-payload branch silently promoted ``numLines == 0`` (an
empty-file read) to the absent-fallback, which evaluated to
``len("".split("\n")) == 1`` — rendering an empty file as 1 line.
Same hazard on ``totalLines == 0`` and (latent) ``startLine == 0``.
Distinguish *absent* from *zero* explicitly via ``is not None``. Also
switch the absent-numLines fallback from ``split("\n")`` to
``splitlines()`` so content ending in ``\n`` (most file content) does
not tack on a phantom trailing element ("x\ny\n".splitlines() →
["x", "y"], length 2 not 3).
Two new regression tests:
- ``test_empty_file`` extended to assert ``num_lines == 0`` /
``total_lines == 0`` (previously passed for the wrong reason —
``is_truncated is False`` because both got promoted to 1).
- ``test_absent_numlines_uses_splitlines`` exercises the absent-
numLines fallback on content with a trailing newline.
User manually tested PR #172 and surfaced a pre-existing off-by-one in the Read tool's HTML title: with input ``offset=775, limit=20`` the title rendered "lines 776-795" while the actual content (correctly) showed lines 775-794. Two compounding off-by-ones in the title generation: - The start was shifted by ``+1`` (``f"line {offset + 1}"``) on the assumption that ``offset`` was 0-based. It is not — ``offset`` is the 1-based starting line number, matching what ``toolUseResult.file.startLine`` reports and what the rendered cat-n content shows. - The end was computed as ``offset + limit`` (exclusive) when it should have been ``offset + limit - 1`` (inclusive — both endpoints are shown in the rendered content). Net effect: title now agrees with the rendered content's line numbering. The ``offset=0 / None`` case (read from start of file) is preserved by treating any falsy ``offset`` as a display value of 1, which matches the actual behaviour of the Read tool when offset is absent (content starts at line 1). Markdown side has no line-range in its Read title (only filename), so no parallel fix needed. Regression test on the existing fixture asserts the title contains ``"lines 775-794"`` and not ``"lines 776-795"``.
Closes #170.
Summary
The Read tool's
tool_result.contentiscat -nformatted (each line prefixed with<line_number><TAB>), but the existing parser's regex only matched the arrow variant (<num>→<content>, used by Edit/Write result snippets). Read entries fell through to the genericToolResultContentfallback — raw monospace text, no syntax highlighting, no lexer detection, no line-number alignment.This PR fixes
parse_read_outputto:toolUseResult.filewhen present. That field carries byte-clean content (no<num>\tprefix) plus accuratefilePath,startLine,numLines,totalLinesmetadata. Avoids the lossy cat-n round-trip and is the only path that knowstotalLines(needed to flagis_truncatedcorrectly).\s+(\d+)→(.*)$to\s*(\d+)[\t→](.*)$so older transcripts (withouttoolUseResult) still parse, and the existing arrow form for Edit/Write snippets is preserved.Readis added toPARSERS_WITH_TOOL_USE_RESULTso the factory passes the structured payload through.No changes to
format_read_output/render_file_content_collapsible— the existing pygments machinery (highlight_code_with_pygmentswith extension-based lexer detection +linenostart=output.start_line) already does the right thing onceReadOutput.contentis clean andstart_lineis correct.Test plan
test/test_read_tool_pygments.pyon a fixture built from the exact tool_use + tool_result pair in Fix Read tool rendering #170:toolUseResult.fileand cat-n text fallback)linenostart, file path in collapsible headingTextLexerfallback, single-line read, empty file, missingfile_pathjust ciclean (1998 tests / 0 ruff / 0 pyright / ty clean)Note on PR #169 timing
If PR #169 (plugin system implementation) merges before this lands, the Read renderer may need to relocate under the new plugin system. The fix here is small and self-contained — relocating is mechanical. Happy to rebase as needed.
Summary by CodeRabbit
New Features
Bug Fixes / Improvements
Tests