Skip to content

feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9

Open
hallerite wants to merge 6 commits into
mainfrom
feat/llama-3-renderer
Open

feat: add Llama-3 renderer for Llama-3.2-1B/3B-Instruct#9
hallerite wants to merge 6 commits into
mainfrom
feat/llama-3-renderer

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 7, 2026

Summary

Hand-coded Llama3Renderer for Meta's Llama-3.x chat template, plus matching parse_llama_3 parser. Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (auto-routed via MODEL_RENDERER_MAP). No version bump.

How tests work without a Meta-license HF token

MODEL_RENDERER_MAP registers the canonical meta-llama/... paths so production callers auto-route. Tests load the tokenizer via the unrestricted unsloth/Llama-3.2-{1B,3B}-Instruct mirror — the chat-template SHA matches Meta's bit-for-bit and the underlying tiktoken-BPE files are identical. CI doesn't need an HF_TOKEN with Meta license access.

Implementation notes

  • No <think> / reasoning channel — Llama-3 doesn't ship one. preserve_*_thinking constructor flags raise NotImplementedError if set (matches DefaultRenderer's contract for the same case).
  • <|begin_of_text|> (BOS) is emitted at the start of every render; system block is always emitted with the fixed Cutting Knowledge Date / Today Date preamble even when no system message is supplied.
  • date_string is a constructor kwarg, defaulting to "26 Jul 2024" (the chat template's strftime fallback) so output stays deterministic. Override per-instance for production runs that want today's date.
  • tools_in_user_message defaults to True (matches chat template). Tools + JSON signatures inject into the first user message; pass False to flip to system-block mode. Both modes parity-tested.
  • Single tool call per assistant message (chat template raises otherwise). Tool calls render as a JSON blob {"name": "...", "parameters": ...} inside the assistant body. Tool responses render under role ipython regardless of source role; mirrors the chat template's content | tojson branch — including the Jinja quirk that strings are iterable, so plain-string tool content gets JSON-quoted.
  • parse_llama_3 detects the JSON tool-call body shape with a strict starts-with-{ + parses-as-dict-with-name check; malformed JSON falls through to content rather than dropping silently.

Tests

47 dedicated tests in tests/test_llama_3.py:

  • MODEL_RENDERER_MAP shape + factory routing
  • Constructor contract (default date, preserve_*_thinking rejection, tools_in_user_message toggle)
  • Byte parity vs apply_chat_template across 11 conversation shapes (system + user, user-only, multi-turn, gen prompt, whitespace trimming, custom date, tools-in-user, tools-in-system, tool call round-trip, dict tool response, multiple-tool-calls rejection)
  • parse_response (plain, tool call, malformed JSON fallthrough)
  • Bridge contract (extends prev verbatim, matches fresh render, rejects assistant in extension, synthesises close on truncation)

Test plan

  • pytest tests/test_llama_3.py — 47 cases pass on both 1B and 3B mirrors
  • Full suite (pytest tests/ --ignore=tests/test_client.py) — 947 pass, 48 skipped, 1 xfailed (no regressions)
  • Pre-commit hooks (ruff check + format) clean
  • Maintainer with Meta-license HF_TOKEN can verify meta-llama/Llama-3.2-1B-Instruct parity directly (the unsloth mirror has been bit-verified, but a once-off canonical run is good defense in depth)

🤖 Generated with Claude Code


Note

Low Risk
Additive renderer and model routing with broad test coverage; scope is limited to two Instruct checkpoints and deterministic date handling for parity.

Overview
Adds a dedicated llama-3 rendering path for Llama 3.2 1B/3B Instruct, wiring meta-llama/Llama-3.2-{1B,3B}-Instruct through MODEL_RENDERER_MAP, the renderer registry, lazy exports, and Llama3RendererConfig (date_string, tools_in_user_message).

Llama3Renderer mirrors Meta’s chat template in Python: always emits BOS and a system preamble (knowledge/today dates), supports tools in the first user message or system block, single JSON tool-call per assistant turn, tool replies as ipython with Jinja-style tojson for strings, plus sampled_mask / is_content and bridge_to_next_turn. parse_llama_3 turns completions into plain text or one {"name", "parameters"} tool call.

Tests use the unsloth mirror with an explicit config; generic HF parity tests that can’t pass date_string are skipped; Llama is folded into bridge, roundtrip, and preserve-thinking matrices.

Reviewed by Cursor Bugbot for commit c4d9667. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add Llama-3 renderer for Llama-3.2-1B/3B-Instruct models

  • Adds Llama3Renderer with full chat template rendering, response parsing, and turn-bridging for Llama-3.x Instruct models.
  • Routes meta-llama/Llama-3.2-1B-Instruct and meta-llama/Llama-3.2-3B-Instruct model IDs to the new llama-3 renderer in MODEL_RENDERER_MAP.
  • Adds Llama3RendererConfig with date_string (default '26 Jul 2024') and tools_in_user_message (default True) options; preserve_*_thinking flags are no-ops for this renderer.
  • Adds parse_llama_3 to parse completions into tool calls (single JSON body with name/parameters) or plain text content.
  • Enforces a single tool call per assistant message, raising ValueError on multiples; test suite skips multi-tool-call roundtrip tests for this renderer accordingly.

Macroscope summarized c4d9667.

hallerite and others added 2 commits May 7, 2026 17:38
Hand-coded Llama3Renderer mirroring Meta's Llama-3.x chat template.
Initial scope: Llama-3.2-1B-Instruct and Llama-3.2-3B-Instruct (and the
unrestricted unsloth/... mirrors with byte-identical chat templates).
MODEL_RENDERER_MAP routes the canonical meta-llama paths; tests load
via the unsloth mirrors so CI doesn't need an HF_TOKEN with Meta
license access.

Implementation notes:

* No <think> / reasoning channel — preserve_*_thinking constructor
  flags raise NotImplementedError if set (matches DefaultRenderer's
  contract for the same case).

* <|begin_of_text|> (BOS) is emitted at the start of every render. The
  system block is emitted UNCONDITIONALLY with a fixed
  "Cutting Knowledge Date / Today Date" preamble even when no system
  message is supplied. date_string is a constructor kwarg pinned at
  "26 Jul 2024" by default (matches the chat template's strftime
  fallback); override per instance for production runs that want
  today's date.

* tools_in_user_message defaults to True. Tools + JSON signatures
  inject into the first user message; pass False at construction to
  flip to system-block mode. Both modes parity-tested.

* Single tool call per assistant message (chat template raises
  otherwise). Tool calls render as a JSON blob inside the assistant
  body. Tool responses render under role ipython regardless of source
  role; mirrors the chat template's content|tojson branch including
  the Jinja quirk that strings are iterable so plain-string tool
  content gets JSON-quoted.

* parse_llama_3 detects the JSON tool-call body shape with a strict
  check; malformed JSON falls through to content.

47 dedicated tests covering map shape, constructor contract, byte
parity across 11 conversation shapes (including tool calls, multi-turn,
custom date, tools-in-system mode), parse_response, and bridge
contract. Full suite: 947 passed, 48 skipped, 1 xfailed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve conflicts in renderers/__init__.py and renderers/base.py:
- Add LagunaXS2Renderer (origin/main) alongside Llama3Renderer (PR).
- Rename Llama-3 registry key from "llama_3" to "llama-3" to match
  origin/main's hyphenated convention (also applied to deepseek-v3,
  kimi-k2, kimi-k2.5, nemotron-3, gpt-oss). Update the matching
  MODEL_RENDERER_MAP entries and tests/test_llama_3.py assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hallerite hallerite marked this pull request as draft May 20, 2026 13:33
@hallerite hallerite marked this pull request as ready for review May 20, 2026 13:33
Comment thread renderers/llama_3.py Outdated
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 20, 2026

Approvability

Verdict: Needs human review

This PR adds a complete new Llama-3 renderer implementation (~500+ lines of new logic) with message rendering, tool call handling, and response parsing - a new feature introducing new capability that warrants human review despite good test coverage.

No code changes detected at c4d9667. Prior analysis still applies.

You can customize Macroscope's approvability policy. Learn more.

# Conflicts:
#	renderers/__init__.py
#	renderers/base.py
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 47cb60b. Configure here.

Comment thread renderers/parsing.py Outdated
hallerite and others added 3 commits June 4, 2026 04:41
parse_llama_3 violated the parser contract (documented in parsing.py's
module docstring: "Every parser emits list[ParsedToolCall]"). It put an
OpenAI-shaped {"function": {...}} dict in ParsedResponse.tool_calls and
used None for the no-call case. Inference-client code that filters on
ToolCallParseStatus.OK — or just iterates tool_calls — broke on every
Llama-3 completion: AttributeError on .status for the call case, and
TypeError iterating None for plain replies.

Emit a ParsedToolCall(raw, name, arguments, token_span, status=OK) for a
detected call, and fall back to the dataclass default (empty list) for
plain content and for {...} bodies that don't parse / lack a name.
Llama-3 has no tool-call delimiter to anchor a "malformed attempt"
against, so a non-tool-call body stays content rather than producing a
non-OK entry — preserving the prior fall-through-to-content behaviour.

Tests updated to the ParsedToolCall shape and empty-list contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Llama-3 renderer predated several contract additions and diverged
from every other renderer. Bring it fully in line:

renderers/llama_3.py
* render() now emits all six RenderedTokens fields. Previously it set
  only token_ids + message_indices, leaving sampled_mask, is_content,
  message_roles, and message_tool_names empty — below even
  DefaultRenderer's baseline. Thread is_sampled/is_content through the
  emit helpers (the qwen3/laguna emit_special/emit_text/
  emit_text_segments trio); scaffold/body splits route through
  attribute_text_segments so byte-parity is preserved.
* bridge_to_next_turn() now returns RenderedTokens (was list[int]),
  violating the Renderer protocol — with the contract attribution
  (prior portion -1/False, new portion indexed + content-masked,
  sampled_mask uniformly False).
* parse_response() gains the `*, tools=None` kwarg from the protocol.
* preserve_*_thinking flags are now no-ops instead of raising — Llama
  has no reasoning channel, the same never-preserves contract as
  Kimi-K2 / Qwen3-VL (no renderer raises on these).

renderers/parsing.py
* parse_llama_3 skips a leading assistant role-header before tool-call
  detection. Delimiter-based parsers tolerate that scaffold naturally;
  Llama's bare-JSON format needs it explicit. No-op on the sampled
  stream in production.

tests/
* Wire Llama-3 into the shared matrices (conftest RENDERER_MODELS,
  test_bridge, test_roundtrip) via the ungated unsloth mirror, and add
  it to NO_OP_MODELS + NEVER_PRESERVES_MODELS.
* Skip Llama for the generic HF-parity files (test_render_ids,
  test_build_helpers): its template fills the date via strftime_now, so
  apply_chat_template parity is non-deterministic — deterministic
  byte-parity (date pinned on both sides) stays in test_llama_3.py.
* Skip the parallel-tool-call round-trip (template forbids >1 call).
* Convert the preserve-thinking rejection tests to no-op assertions.

Full suite: 1947 passed, 88 skipped, 1 xfailed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant