Skip to content

Render Read tool results with pygments via structured payload (closes #170)#172

Merged
cboos merged 3 commits into
mainfrom
dev/issue-170-read-tool-pygments
May 25, 2026
Merged

Render Read tool results with pygments via structured payload (closes #170)#172
cboos merged 3 commits into
mainfrom
dev/issue-170-read-tool-pygments

Conversation

@cboos
Copy link
Copy Markdown
Collaborator

@cboos cboos commented May 25, 2026

Closes #170.

Summary

The Read tool's tool_result.content is cat -n formatted (each line prefixed with <line_number><TAB>), but the existing parser's regex only matched the arrow variant (<num>→<content>, used by Edit/Write result snippets). Read entries fell through to the generic ToolResultContent fallback — raw monospace text, no syntax highlighting, no lexer detection, no line-number alignment.

This PR fixes parse_read_output to:

  1. Prefer toolUseResult.file when present. That field carries byte-clean content (no <num>\t prefix) plus accurate filePath, startLine, numLines, totalLines metadata. Avoids the lossy cat-n round-trip and is the only path that knows totalLines (needed to flag is_truncated correctly).
  2. Extend the cat-n fallback regex from \s+(\d+)→(.*)$ to \s*(\d+)[\t→](.*)$ so older transcripts (without toolUseResult) still parse, and the existing arrow form for Edit/Write snippets is preserved.

Read is added to PARSERS_WITH_TOOL_USE_RESULT so the factory passes the structured payload through.

No changes to format_read_output / render_file_content_collapsible — the existing pygments machinery (highlight_code_with_pygments with extension-based lexer detection + linenostart=output.start_line) already does the right thing once ReadOutput.content is clean and start_line is correct.

Test plan

  • 12 new regression tests in test/test_read_tool_pygments.py on a fixture built from the exact tool_use + tool_result pair in Fix Read tool rendering #170:
    • Both parser paths (structured toolUseResult.file and cat-n text fallback)
    • Tab separator (Read, Claude Code 2.1.x+) and arrow separator (Edit/Write — must not regress)
    • HTML rendering with pygments python lexer, correct linenostart, file path in collapsible heading
    • Edge cases: unknown extension → TextLexer fallback, single-line read, empty file, missing file_path
  • just ci clean (1998 tests / 0 ruff / 0 pyright / ty clean)
  • Snapshot tests pass without update — no existing snapshot fixture contained a Read tool result, so the parser change has no observable effect there
  • Manual sanity check: render a real transcript containing Read calls (e.g. the session screenshot from Fix Read tool rendering #170) and confirm syntax highlighting + correct starting line numbers appear

Note on PR #169 timing

If PR #169 (plugin system implementation) merges before this lands, the Read renderer may need to relocate under the new plugin system. The fix here is small and self-contained — relocating is mechanical. Happy to rebase as needed.

Summary by CodeRabbit

  • New Features

    • Preserve structured file metadata for read results so parsed outputs retain exact file paths and line counts.
  • Bug Fixes / Improvements

    • Accept multiple transcript line-separator formats (arrow and tab) for wider compatibility.
    • Fix displayed line-range rendering to use inclusive start/end semantics and correct single-line wording.
  • Tests

    • Add end-to-end tests covering structured parsing, fallback parsing, rendering, and edge cases (empty/single-line reads).

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 80d822cc-5e54-4a05-b790-033c2569dfce

📥 Commits

Reviewing files that changed from the base of the PR and between 4e42fd0 and fa30a60.

📒 Files selected for processing (2)
  • claude_code_log/html/renderer.py
  • test/test_read_tool_pygments.py

📝 Walkthrough

Walkthrough

The PR enhances Read tool parsing to handle structured toolUseResult.file metadata while maintaining backward compatibility with legacy cat-n text format. The parser recognizes both tab and arrow separators, is registered to receive structured payloads, and is validated end-to-end with tests covering parsing paths, rendering, and edge cases.

Changes

Read tool structured parsing and rendering

Layer / File(s) Summary
Parser registration and cat-n separator support
claude_code_log/factories/tool_factory.py, test/test_read_tool_pygments.py
Regex updated to accept both \t and separators when parsing cat-n snippet lines. Read parser added to PARSERS_WITH_TOOL_USE_RESULT registry so create_tool_output passes tool_use_result into parse_read_output. Registration test confirms wiring.
Read output parsing with structured and fallback paths
claude_code_log/factories/tool_factory.py, test/test_data/read_tool_pygments.jsonl, test/test_read_tool_pygments.py
parse_read_output now accepts optional tool_use_result, extracts metadata and content from toolUseResult.file when present (filePath, startLine, numLines, totalLines, is_truncated), and falls back to cat-n regex parsing for older transcripts. Test fixture data and parsing tests cover structured preferred path, minimal structured fields, fallback with both separator variants, non-cat-n content rejection, and missing file_path handling.
Rendering and edge-case validation
claude_code_log/html/renderer.py, test/test_read_tool_pygments.py
Title rendering for Read inputs now formats inclusive 1-based line ranges. End-to-end HTML rendering tests verify Python lexer highlighting, correct starting line presence, file path surfacing, and absence of leaked raw cat-n "<num>\t" prefixes. Tests also cover unknown-extension fallback, single-line reads, empty-file behavior (numLines == 0), and missing numLines fallback using splitlines().

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 The Read tool now sees structured light,
Both tab and arrow separators bright,
Metadata flows where Pygments can shine,
Edge cases handled, each line by line.
From claude-code-log, a rendering delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly matches the main objective: implementing pygments rendering for Read tool results via structured payload, as confirmed by issue #170.
Linked Issues check ✅ Passed All requirements from #170 are implemented: parsing structured toolUseResult.file for clean content and metadata, supporting both tab and arrow separators in cat-n fallback, adding Read to PARSERS_WITH_TOOL_USE_RESULT, and enabling pygments rendering with correct line numbers and lexer detection.
Out of Scope Changes check ✅ Passed All changes are directly related to #170: parser updates for structured payloads, cat-n regex expansion, registry adjustment, title rendering logic, test data, and comprehensive test coverage. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev/issue-170-read-tool-pygments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The Read tool's `tool_result.content` is `cat -n` formatted (each line
prefixed with `<line_number><TAB>`), and the existing parser's regex only
matched the arrow variant (`<num>→<content>`, used by Edit/Write result
snippets). Read entries fell through to the generic ToolResultContent
fallback — rendered as raw monospace text with no syntax highlighting,
no lexer detection, no line-number alignment.

Two-pronged fix in `parse_read_output`:

1. Preferred path: consume `toolUseResult.file` directly when present.
   That field carries byte-clean content (no `<num>\t` prefix) plus
   accurate `filePath`, `startLine`, `numLines`, `totalLines` metadata.
   Avoids the lossy cat-n round-trip and is the only path that knows
   `totalLines` (needed for the truncation flag).

2. Fallback path: extend `_parse_cat_n_snippet`'s regex from
   `\s+(\d+)→(.*)$` to `\s*(\d+)[\t→](.*)$` so it accepts both
   separator variants. Read on older transcripts (no `toolUseResult`)
   still parses correctly; Edit/Write arrow form is preserved.

Register "Read" in `PARSERS_WITH_TOOL_USE_RESULT` so the factory passes
the structured payload through. No changes to `format_read_output` /
`render_file_content_collapsible` — the existing pygments machinery
(`highlight_code_with_pygments` + `linenostart=output.start_line`)
already does the right thing once `ReadOutput.content` is clean and
`start_line` is correct.

12 new regression tests on `read_tool_pygments.jsonl` (built from the
exact tool_use + tool_result pair in issue #170) cover both parser
paths, lexer detection, line-number alignment, and edge cases
(unknown extension, single-line, empty file, missing file_path).
@cboos cboos force-pushed the dev/issue-170-read-tool-pygments branch from beda168 to 2ae70c1 Compare May 25, 2026 08:41
The ``int(file_info.get("numLines") or default)`` shortcut in the
structured-payload branch silently promoted ``numLines == 0`` (an
empty-file read) to the absent-fallback, which evaluated to
``len("".split("\n")) == 1`` — rendering an empty file as 1 line.
Same hazard on ``totalLines == 0`` and (latent) ``startLine == 0``.

Distinguish *absent* from *zero* explicitly via ``is not None``. Also
switch the absent-numLines fallback from ``split("\n")`` to
``splitlines()`` so content ending in ``\n`` (most file content) does
not tack on a phantom trailing element ("x\ny\n".splitlines() →
["x", "y"], length 2 not 3).

Two new regression tests:

- ``test_empty_file`` extended to assert ``num_lines == 0`` /
  ``total_lines == 0`` (previously passed for the wrong reason —
  ``is_truncated is False`` because both got promoted to 1).
- ``test_absent_numlines_uses_splitlines`` exercises the absent-
  numLines fallback on content with a trailing newline.
@cboos cboos marked this pull request as ready for review May 25, 2026 08:48
User manually tested PR #172 and surfaced a pre-existing off-by-one in
the Read tool's HTML title: with input ``offset=775, limit=20`` the
title rendered "lines 776-795" while the actual content (correctly)
showed lines 775-794. Two compounding off-by-ones in the title
generation:

- The start was shifted by ``+1`` (``f"line {offset + 1}"``) on the
  assumption that ``offset`` was 0-based. It is not — ``offset`` is
  the 1-based starting line number, matching what
  ``toolUseResult.file.startLine`` reports and what the rendered cat-n
  content shows.
- The end was computed as ``offset + limit`` (exclusive) when it
  should have been ``offset + limit - 1`` (inclusive — both endpoints
  are shown in the rendered content).

Net effect: title now agrees with the rendered content's line
numbering. The ``offset=0 / None`` case (read from start of file) is
preserved by treating any falsy ``offset`` as a display value of 1,
which matches the actual behaviour of the Read tool when offset is
absent (content starts at line 1).

Markdown side has no line-range in its Read title (only filename), so
no parallel fix needed.

Regression test on the existing fixture asserts the title contains
``"lines 775-794"`` and not ``"lines 776-795"``.
@cboos cboos merged commit 4ff8e96 into main May 25, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Read tool rendering

1 participant