feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct by hallerite · Pull Request #9 · PrimeIntellect-ai/renderers

hallerite · 2026-05-07T17:39:20Z

Summary

Hand-coded Llama3Renderer for Meta's Llama-3.x chat template, plus matching parse_llama_3 parser. Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (auto-routed via MODEL_RENDERER_MAP). No version bump.

How tests work without a Meta-license HF token

MODEL_RENDERER_MAP registers the canonical meta-llama/... paths so production callers auto-route. Tests load the tokenizer via the unrestricted unsloth/Llama-3.2-{1B,3B}-Instruct mirror — the chat-template SHA matches Meta's bit-for-bit and the underlying tiktoken-BPE files are identical. CI doesn't need an HF_TOKEN with Meta license access.

Implementation notes

No <think> / reasoning channel — Llama-3 doesn't ship one. preserve_*_thinking constructor flags raise NotImplementedError if set (matches DefaultRenderer's contract for the same case).
<|begin_of_text|> (BOS) is emitted at the start of every render; system block is always emitted with the fixed Cutting Knowledge Date / Today Date preamble even when no system message is supplied.
date_string is a constructor kwarg, defaulting to "26 Jul 2024" (the chat template's strftime fallback) so output stays deterministic. Override per-instance for production runs that want today's date.
tools_in_user_message defaults to True (matches chat template). Tools + JSON signatures inject into the first user message; pass False to flip to system-block mode. Both modes parity-tested.
Single tool call per assistant message (chat template raises otherwise). Tool calls render as a JSON blob {"name": "...", "parameters": ...} inside the assistant body. Tool responses render under role ipython regardless of source role; mirrors the chat template's content | tojson branch — including the Jinja quirk that strings are iterable, so plain-string tool content gets JSON-quoted.
parse_llama_3 detects the JSON tool-call body shape with a strict starts-with-{ + parses-as-dict-with-name check; malformed JSON falls through to content rather than dropping silently.

Tests

47 dedicated tests in tests/test_llama_3.py:

MODEL_RENDERER_MAP shape + factory routing
Constructor contract (default date, preserve_*_thinking rejection, tools_in_user_message toggle)
Byte parity vs apply_chat_template across 11 conversation shapes (system + user, user-only, multi-turn, gen prompt, whitespace trimming, custom date, tools-in-user, tools-in-system, tool call round-trip, dict tool response, multiple-tool-calls rejection)
parse_response (plain, tool call, malformed JSON fallthrough)
Bridge contract (extends prev verbatim, matches fresh render, rejects assistant in extension, synthesises close on truncation)

Test plan

pytest tests/test_llama_3.py — 47 cases pass on both 1B and 3B mirrors
Full suite (pytest tests/ --ignore=tests/test_client.py) — 947 pass, 48 skipped, 1 xfailed (no regressions)
Pre-commit hooks (ruff check + format) clean
Maintainer with Meta-license HF_TOKEN can verify meta-llama/Llama-3.2-1B-Instruct parity directly (the unsloth mirror has been bit-verified, but a once-off canonical run is good defense in depth)

🤖 Generated with Claude Code

Note

Low Risk
Additive renderer and model routing with broad test coverage; scope is limited to two Instruct checkpoints and deterministic date handling for parity.

Overview
Adds a dedicated llama-3 rendering path for Llama 3.2 1B/3B Instruct, wiring meta-llama/Llama-3.2-{1B,3B}-Instruct through MODEL_RENDERER_MAP, the renderer registry, lazy exports, and Llama3RendererConfig (date_string, tools_in_user_message).

Llama3Renderer mirrors Meta’s chat template in Python: always emits BOS and a system preamble (knowledge/today dates), supports tools in the first user message or system block, single JSON tool-call per assistant turn, tool replies as ipython with Jinja-style tojson for strings, plus sampled_mask / is_content and bridge_to_next_turn. parse_llama_3 turns completions into plain text or one {"name", "parameters"} tool call.

Tests use the unsloth mirror with an explicit config; generic HF parity tests that can’t pass date_string are skipped; Llama is folded into bridge, roundtrip, and preserve-thinking matrices.

^{Reviewed by Cursor Bugbot for commit c4d9667. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add Llama-3 renderer for Llama-3.2-1B/3B-Instruct models

Adds Llama3Renderer with full chat template rendering, response parsing, and turn-bridging for Llama-3.x Instruct models.
Routes meta-llama/Llama-3.2-1B-Instruct and meta-llama/Llama-3.2-3B-Instruct model IDs to the new llama-3 renderer in MODEL_RENDERER_MAP.
Adds Llama3RendererConfig with date_string (default '26 Jul 2024') and tools_in_user_message (default True) options; preserve_*_thinking flags are no-ops for this renderer.
Adds parse_llama_3 to parse completions into tool calls (single JSON body with name/parameters) or plain text content.
Enforces a single tool call per assistant message, raising ValueError on multiples; test suite skips multi-tool-call roundtrip tests for this renderer accordingly.

^{Macroscope summarized c4d9667.}

Hand-coded Llama3Renderer mirroring Meta's Llama-3.x chat template. Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (and the unrestricted unsloth/... mirrors with byte-identical chat templates). MODEL_RENDERER_MAP routes the canonical meta-llama paths; tests load via the unsloth mirrors so CI doesn't need an HF_TOKEN with Meta license access. Implementation notes: * No <think> / reasoning channel — preserve_*_thinking constructor flags raise NotImplementedError if set (matches DefaultRenderer's contract for the same case). * <|begin_of_text|> (BOS) is emitted at the start of every render. The system block is emitted UNCONDITIONALLY with a fixed "Cutting Knowledge Date / Today Date" preamble even when no system message is supplied. date_string is a constructor kwarg pinned at "26 Jul 2024" by default (matches the chat template's strftime fallback); override per instance for production runs that want today's date. * tools_in_user_message defaults to True. Tools + JSON signatures inject into the first user message; pass False at construction to flip to system-block mode. Both modes parity-tested. * Single tool call per assistant message (chat template raises otherwise). Tool calls render as a JSON blob inside the assistant body. Tool responses render under role ipython regardless of source role; mirrors the chat template's content|tojson branch including the Jinja quirk that strings are iterable so plain-string tool content gets JSON-quoted. * parse_llama_3 detects the JSON tool-call body shape with a strict check; malformed JSON falls through to content. 47 dedicated tests covering map shape, constructor contract, byte parity across 11 conversation shapes (including tool calls, multi-turn, custom date, tools-in-system mode), parse_response, and bridge contract. Full suite: 947 passed, 48 skipped, 1 xfailed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolve conflicts in renderers/__init__.py and renderers/base.py: - Add LagunaXS2Renderer (origin/main) alongside Llama3Renderer (PR). - Rename Llama-3 registry key from "llama_3" to "llama-3" to match origin/main's hyphenated convention (also applied to deepseek-v3, kimi-k2, kimi-k2.5, nemotron-3, gpt-oss). Update the matching MODEL_RENDERER_MAP entries and tests/test_llama_3.py assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

macroscopeapp · 2026-05-20T13:43:58Z

Approvability

Verdict: Needs human review

This PR adds a complete new Llama-3 renderer implementation (~500+ lines of new logic) with message rendering, tool call handling, and response parsing - a new feature introducing new capability that warrants human review despite good test coverage.

No code changes detected at c4d9667. Prior analysis still applies.

^{You can customize Macroscope's approvability policy. Learn more.}

# Conflicts: # renderers/__init__.py # renderers/base.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 47cb60b. Configure here.}

parse_llama_3 violated the parser contract (documented in parsing.py's module docstring: "Every parser emits list[ParsedToolCall]"). It put an OpenAI-shaped {"function": {...}} dict in ParsedResponse.tool_calls and used None for the no-call case. Inference-client code that filters on ToolCallParseStatus.OK — or just iterates tool_calls — broke on every Llama-3 completion: AttributeError on .status for the call case, and TypeError iterating None for plain replies. Emit a ParsedToolCall(raw, name, arguments, token_span, status=OK) for a detected call, and fall back to the dataclass default (empty list) for plain content and for {...} bodies that don't parse / lack a name. Llama-3 has no tool-call delimiter to anchor a "malformed attempt" against, so a non-tool-call body stays content rather than producing a non-OK entry — preserving the prior fall-through-to-content behaviour. Tests updated to the ParsedToolCall shape and empty-list contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The Llama-3 renderer predated several contract additions and diverged from every other renderer. Bring it fully in line: renderers/llama_3.py * render() now emits all six RenderedTokens fields. Previously it set only token_ids + message_indices, leaving sampled_mask, is_content, message_roles, and message_tool_names empty — below even DefaultRenderer's baseline. Thread is_sampled/is_content through the emit helpers (the qwen3/laguna emit_special/emit_text/ emit_text_segments trio); scaffold/body splits route through attribute_text_segments so byte-parity is preserved. * bridge_to_next_turn() now returns RenderedTokens (was list[int]), violating the Renderer protocol — with the contract attribution (prior portion -1/False, new portion indexed + content-masked, sampled_mask uniformly False). * parse_response() gains the `*, tools=None` kwarg from the protocol. * preserve_*_thinking flags are now no-ops instead of raising — Llama has no reasoning channel, the same never-preserves contract as Kimi-K2 / Qwen3-VL (no renderer raises on these). renderers/parsing.py * parse_llama_3 skips a leading assistant role-header before tool-call detection. Delimiter-based parsers tolerate that scaffold naturally; Llama's bare-JSON format needs it explicit. No-op on the sampled stream in production. tests/ * Wire Llama-3 into the shared matrices (conftest RENDERER_MODELS, test_bridge, test_roundtrip) via the ungated unsloth mirror, and add it to NO_OP_MODELS + NEVER_PRESERVES_MODELS. * Skip Llama for the generic HF-parity files (test_render_ids, test_build_helpers): its template fills the date via strftime_now, so apply_chat_template parity is non-deterministic — deterministic byte-parity (date pinned on both sides) stays in test_llama_3.py. * Skip the parallel-tool-call round-trip (template forbids >1 call). * Convert the preserve-thinking rejection tests to no-op assertions. Full suite: 1947 passed, 88 skipped, 1 xfailed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # renderers/base.py

hallerite and others added 2 commits May 7, 2026 17:38

hallerite marked this pull request as draft May 20, 2026 13:33

hallerite marked this pull request as ready for review May 20, 2026 13:33

cursor Bot reviewed May 20, 2026

View reviewed changes

Comment thread renderers/llama_3.py Outdated

Merge remote-tracking branch 'origin/main' into feat/llama-3-renderer

47cb60b

# Conflicts: # renderers/__init__.py # renderers/base.py

cursor Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread renderers/parsing.py Outdated

hallerite and others added 3 commits June 4, 2026 04:41

Merge remote-tracking branch 'origin/main' into feat/llama-3-renderer

c4d9667

# Conflicts: # renderers/base.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9

feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9
hallerite wants to merge 6 commits into
mainfrom
feat/llama-3-renderer

hallerite commented May 7, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

macroscopeapp Bot commented May 20, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallerite commented May 7, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How tests work without a Meta-license HF token

Implementation notes

Tests

Test plan

Add Llama-3 renderer for Llama-3.2-1B/3B-Instruct models

Uh oh!

Uh oh!

macroscopeapp Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hallerite commented May 7, 2026 •

edited by cursor Bot

Loading

macroscopeapp Bot commented May 20, 2026 •

edited

Loading