Fix: model fails to load when chat template uses HuggingFace generation tags#2226
Open
tobocop2 wants to merge 1 commit into
Open
Fix: model fails to load when chat template uses HuggingFace generation tags#2226tobocop2 wants to merge 1 commit into
tobocop2 wants to merge 1 commit into
Conversation
HuggingFace's transformers chat-template extension adds {% generation %}
and {% endgeneration %} tags so trainers can mark generation spans for
loss masking. The tags ship in GGUF tokenizer.chat_template metadata
(SmolLM3 et al), but jinja2's default environment doesn't recognize
them, so Llama() raises TemplateSyntaxError at init for any affected
GGUF, even when the caller passes an explicit chat_format override.
Register a minimal Jinja extension that treats both tags as inert
wrappers: the body between them renders as-is, the markers themselves
emit nothing. No behavioral change for templates that don't use the
tags.
Prior art: PR abetlen#2082 attempted the same approach but referenced an
unimported 'nodes' module and didn't consume the body or closing tag.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
GGUFs whose embedded
tokenizer.chat_templateuses{% generation %}/{% endgeneration %}(SmolLM3 and other HF-shipped models) fail to load withTemplateSyntaxErrorinLlama.__init__, even when the caller passes achat_formatoverride.Jinja2ChatFormattereagerly compiles every embedded template.Solution
Register a Jinja extension that treats both tags as inert wrappers: the body renders as-is, the markers emit nothing. No behavioral change for templates that don't use the tags.
Relationship to #2082
PR #2082 attempted the same approach but is incomplete:
parse()method referencesnodes.Const("")without importingnodesfromjinja2, so it wouldNameErroron first use.parser.stream.skip(1); return nodes.Const("")consumes only the tag name. It never advances past the body or the closing{% endgeneration %}, so the parser is left in a broken state and the next template construct fails to parse.This PR addresses both: imports
nodesandExtensionexplicitly, and consumes the body viaparser.parse_statements(("name:endgeneration",), drop_needle=True)so the wrapped content renders and the parser advances past the closing tag. Includes a unit test that fails today and passes with the fix.Closes #2225.