feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9
Open
hallerite wants to merge 6 commits into
Open
Conversation
Hand-coded Llama3Renderer mirroring Meta's Llama-3.x chat template. Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (and the unrestricted unsloth/... mirrors with byte-identical chat templates). MODEL_RENDERER_MAP routes the canonical meta-llama paths; tests load via the unsloth mirrors so CI doesn't need an HF_TOKEN with Meta license access. Implementation notes: * No <think> / reasoning channel — preserve_*_thinking constructor flags raise NotImplementedError if set (matches DefaultRenderer's contract for the same case). * <|begin_of_text|> (BOS) is emitted at the start of every render. The system block is emitted UNCONDITIONALLY with a fixed "Cutting Knowledge Date / Today Date" preamble even when no system message is supplied. date_string is a constructor kwarg pinned at "26 Jul 2024" by default (matches the chat template's strftime fallback); override per instance for production runs that want today's date. * tools_in_user_message defaults to True. Tools + JSON signatures inject into the first user message; pass False at construction to flip to system-block mode. Both modes parity-tested. * Single tool call per assistant message (chat template raises otherwise). Tool calls render as a JSON blob inside the assistant body. Tool responses render under role ipython regardless of source role; mirrors the chat template's content|tojson branch including the Jinja quirk that strings are iterable so plain-string tool content gets JSON-quoted. * parse_llama_3 detects the JSON tool-call body shape with a strict check; malformed JSON falls through to content. 47 dedicated tests covering map shape, constructor contract, byte parity across 11 conversation shapes (including tool calls, multi-turn, custom date, tools-in-system mode), parse_response, and bridge contract. Full suite: 947 passed, 48 skipped, 1 xfailed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve conflicts in renderers/__init__.py and renderers/base.py: - Add LagunaXS2Renderer (origin/main) alongside Llama3Renderer (PR). - Rename Llama-3 registry key from "llama_3" to "llama-3" to match origin/main's hyphenated convention (also applied to deepseek-v3, kimi-k2, kimi-k2.5, nemotron-3, gpt-oss). Update the matching MODEL_RENDERER_MAP entries and tests/test_llama_3.py assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ApprovabilityVerdict: Needs human review This PR adds a complete new Llama-3 renderer implementation (~500+ lines of new logic) with message rendering, tool call handling, and response parsing - a new feature introducing new capability that warrants human review despite good test coverage. No code changes detected at You can customize Macroscope's approvability policy. Learn more. |
# Conflicts: # renderers/__init__.py # renderers/base.py
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 47cb60b. Configure here.
parse_llama_3 violated the parser contract (documented in parsing.py's
module docstring: "Every parser emits list[ParsedToolCall]"). It put an
OpenAI-shaped {"function": {...}} dict in ParsedResponse.tool_calls and
used None for the no-call case. Inference-client code that filters on
ToolCallParseStatus.OK — or just iterates tool_calls — broke on every
Llama-3 completion: AttributeError on .status for the call case, and
TypeError iterating None for plain replies.
Emit a ParsedToolCall(raw, name, arguments, token_span, status=OK) for a
detected call, and fall back to the dataclass default (empty list) for
plain content and for {...} bodies that don't parse / lack a name.
Llama-3 has no tool-call delimiter to anchor a "malformed attempt"
against, so a non-tool-call body stays content rather than producing a
non-OK entry — preserving the prior fall-through-to-content behaviour.
Tests updated to the ParsedToolCall shape and empty-list contract.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Llama-3 renderer predated several contract additions and diverged from every other renderer. Bring it fully in line: renderers/llama_3.py * render() now emits all six RenderedTokens fields. Previously it set only token_ids + message_indices, leaving sampled_mask, is_content, message_roles, and message_tool_names empty — below even DefaultRenderer's baseline. Thread is_sampled/is_content through the emit helpers (the qwen3/laguna emit_special/emit_text/ emit_text_segments trio); scaffold/body splits route through attribute_text_segments so byte-parity is preserved. * bridge_to_next_turn() now returns RenderedTokens (was list[int]), violating the Renderer protocol — with the contract attribution (prior portion -1/False, new portion indexed + content-masked, sampled_mask uniformly False). * parse_response() gains the `*, tools=None` kwarg from the protocol. * preserve_*_thinking flags are now no-ops instead of raising — Llama has no reasoning channel, the same never-preserves contract as Kimi-K2 / Qwen3-VL (no renderer raises on these). renderers/parsing.py * parse_llama_3 skips a leading assistant role-header before tool-call detection. Delimiter-based parsers tolerate that scaffold naturally; Llama's bare-JSON format needs it explicit. No-op on the sampled stream in production. tests/ * Wire Llama-3 into the shared matrices (conftest RENDERER_MODELS, test_bridge, test_roundtrip) via the ungated unsloth mirror, and add it to NO_OP_MODELS + NEVER_PRESERVES_MODELS. * Skip Llama for the generic HF-parity files (test_render_ids, test_build_helpers): its template fills the date via strftime_now, so apply_chat_template parity is non-deterministic — deterministic byte-parity (date pinned on both sides) stays in test_llama_3.py. * Skip the parallel-tool-call round-trip (template forbids >1 call). * Convert the preserve-thinking rejection tests to no-op assertions. Full suite: 1947 passed, 88 skipped, 1 xfailed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # renderers/base.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Hand-coded
Llama3Rendererfor Meta's Llama-3.x chat template, plus matchingparse_llama_3parser. Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (auto-routed viaMODEL_RENDERER_MAP). No version bump.How tests work without a Meta-license HF token
MODEL_RENDERER_MAPregisters the canonicalmeta-llama/...paths so production callers auto-route. Tests load the tokenizer via the unrestrictedunsloth/Llama-3.2-{1B,3B}-Instructmirror — the chat-template SHA matches Meta's bit-for-bit and the underlying tiktoken-BPE files are identical. CI doesn't need an HF_TOKEN with Meta license access.Implementation notes
<think>/ reasoning channel — Llama-3 doesn't ship one.preserve_*_thinkingconstructor flags raiseNotImplementedErrorif set (matchesDefaultRenderer's contract for the same case).<|begin_of_text|>(BOS) is emitted at the start of every render; system block is always emitted with the fixedCutting Knowledge Date / Today Datepreamble even when no system message is supplied.date_stringis a constructor kwarg, defaulting to"26 Jul 2024"(the chat template'sstrftimefallback) so output stays deterministic. Override per-instance for production runs that want today's date.tools_in_user_messagedefaults toTrue(matches chat template). Tools + JSON signatures inject into the first user message; passFalseto flip to system-block mode. Both modes parity-tested.{"name": "...", "parameters": ...}inside the assistant body. Tool responses render under roleipythonregardless of source role; mirrors the chat template'scontent | tojsonbranch — including the Jinja quirk that strings are iterable, so plain-string tool content gets JSON-quoted.parse_llama_3detects the JSON tool-call body shape with a strict starts-with-{+ parses-as-dict-with-namecheck; malformed JSON falls through tocontentrather than dropping silently.Tests
47 dedicated tests in
tests/test_llama_3.py:MODEL_RENDERER_MAPshape + factory routingpreserve_*_thinkingrejection,tools_in_user_messagetoggle)apply_chat_templateacross 11 conversation shapes (system + user, user-only, multi-turn, gen prompt, whitespace trimming, custom date, tools-in-user, tools-in-system, tool call round-trip, dict tool response, multiple-tool-calls rejection)parse_response(plain, tool call, malformed JSON fallthrough)Test plan
pytest tests/test_llama_3.py— 47 cases pass on both 1B and 3B mirrorspytest tests/ --ignore=tests/test_client.py) — 947 pass, 48 skipped, 1 xfailed (no regressions)meta-llama/Llama-3.2-1B-Instructparity directly (the unsloth mirror has been bit-verified, but a once-off canonical run is good defense in depth)🤖 Generated with Claude Code
Note
Low Risk
Additive renderer and model routing with broad test coverage; scope is limited to two Instruct checkpoints and deterministic date handling for parity.
Overview
Adds a dedicated
llama-3rendering path for Llama 3.2 1B/3B Instruct, wiringmeta-llama/Llama-3.2-{1B,3B}-InstructthroughMODEL_RENDERER_MAP, the renderer registry, lazy exports, andLlama3RendererConfig(date_string,tools_in_user_message).Llama3Renderermirrors Meta’s chat template in Python: always emits BOS and a system preamble (knowledge/today dates), supports tools in the first user message or system block, single JSON tool-call per assistant turn, tool replies asipythonwith Jinja-styletojsonfor strings, plussampled_mask/is_contentandbridge_to_next_turn.parse_llama_3turns completions into plain text or one{"name", "parameters"}tool call.Tests use the
unslothmirror with an explicit config; generic HF parity tests that can’t passdate_stringare skipped; Llama is folded into bridge, roundtrip, and preserve-thinking matrices.Reviewed by Cursor Bugbot for commit c4d9667. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add Llama-3 renderer for Llama-3.2-1B/3B-Instruct models
Llama3Rendererwith full chat template rendering, response parsing, and turn-bridging for Llama-3.x Instruct models.meta-llama/Llama-3.2-1B-Instructandmeta-llama/Llama-3.2-3B-Instructmodel IDs to the newllama-3renderer inMODEL_RENDERER_MAP.Llama3RendererConfigwithdate_string(default'26 Jul 2024') andtools_in_user_message(defaultTrue) options;preserve_*_thinkingflags are no-ops for this renderer.parse_llama_3to parse completions into tool calls (single JSON body withname/parameters) or plain text content.ValueErroron multiples; test suite skips multi-tool-call roundtrip tests for this renderer accordingly.Macroscope summarized c4d9667.