I was reading the smollm2 wiring in examples/models/llama/export_llama_lib.py and ran into something I can't tell is intentional or a wiring mistake. Asking before doing anything that depends on either interpretation.
What I observed in current main
# examples/models/llama/export_llama_lib.py
HUGGING_FACE_REPO_IDS = {
...
"smollm2": "HuggingFaceTB/SmolLM-135M",
...
}
HuggingFaceTB/SmolLM-135M is the SmolLM v1 repo. The SmolLM2 model from the same org lives at HuggingFaceTB/SmolLM2-135M; they are listed as separate repos on HuggingFace.
examples/models/smollm2/135M_config.json has "rope_theta": 10000.0, which matches v1's HF config.json. v2's HF config.json has rope_theta: 100000.
To be sure this wasn't just a metadata-naming thing, I downloaded both checkpoints and compared model.safetensors:
- Different SHA-256, different on-disk sizes (v1 ships fp32 538 MB, v2 ships bf16 269 MB).
- Identical key set — both are
LlamaForCausalLM, 272 tensors, same shapes.
- 0 of 272 tensors are bit-identical. Per-tensor
max_abs_diff between the two ranges from 0.67 to 10.56, well above any dtype-precision noise floor.
- First row of
model.embed_tokens.weight:
- v1:
[-0.379, -0.219, 0.028, -0.262, -0.231, -0.164, 0.082, -0.246]
- v2:
[-0.118, 0.028, 0.048, -0.008, -0.056, -0.052, 0.016, -0.134]
So the two repos contain genuinely different weights, not the same model under two names.
What confused me when I looked at the original PR
Looking at the seeding PR #9354 (Add SmolLM (smollm2)), the review trail looks like the intent was SmolLM2 v2:
- The PR was originally submitted with the directory named
examples/models/smollm/ (v1 family name). The reviewer asked the author to rename it to smollm2:
"rename this and directory to smolllm2" — Reviewer
"Ah - it should be smollm2*" — Reviewer
- While reviewing the params JSON, the reviewer cross-referenced the SmolLM2 HuggingFace config to validate
hidden_dim:
"Should be 1536 - https://huggingface.co/HuggingFaceTB/SmolLM2-135M/blob/main/config.json#L12" — Reviewer
So from the review history it looks like the reviewer believed they were merging SmolLM2 v2 support. But the resulting params JSON has v1's rope_theta, and the HUGGING_FACE_REPO_IDS entry points at v1's repo. The fields the reviewer actually checked (hidden_dim, use_hf_rope, tied embeddings, model_type) all happen to be identical between v1 and v2, so a v1-vs-v2 mismatch on the unchecked fields wouldn't have shown up in review.
My question
Is the smollm2 alias here intentionally meant to point at SmolLM v1 (in which case the naming is just historical and I should treat it that way), or is this an unnoticed wiring mistake where the alias was supposed to land on SmolLM2 but ended up on v1?
I was reading the
smollm2wiring inexamples/models/llama/export_llama_lib.pyand ran into something I can't tell is intentional or a wiring mistake. Asking before doing anything that depends on either interpretation.What I observed in current
mainHuggingFaceTB/SmolLM-135Mis the SmolLM v1 repo. The SmolLM2 model from the same org lives atHuggingFaceTB/SmolLM2-135M; they are listed as separate repos on HuggingFace.examples/models/smollm2/135M_config.jsonhas"rope_theta": 10000.0, which matches v1's HFconfig.json. v2's HFconfig.jsonhasrope_theta: 100000.To be sure this wasn't just a metadata-naming thing, I downloaded both checkpoints and compared
model.safetensors:LlamaForCausalLM, 272 tensors, same shapes.max_abs_diffbetween the two ranges from0.67to10.56, well above any dtype-precision noise floor.model.embed_tokens.weight:[-0.379, -0.219, 0.028, -0.262, -0.231, -0.164, 0.082, -0.246][-0.118, 0.028, 0.048, -0.008, -0.056, -0.052, 0.016, -0.134]So the two repos contain genuinely different weights, not the same model under two names.
What confused me when I looked at the original PR
Looking at the seeding PR #9354 (Add SmolLM (smollm2)), the review trail looks like the intent was SmolLM2 v2:
examples/models/smollm/(v1 family name). The reviewer asked the author to rename it tosmollm2:hidden_dim:So from the review history it looks like the reviewer believed they were merging SmolLM2 v2 support. But the resulting params JSON has v1's
rope_theta, and theHUGGING_FACE_REPO_IDSentry points at v1's repo. The fields the reviewer actually checked (hidden_dim,use_hf_rope, tied embeddings,model_type) all happen to be identical between v1 and v2, so a v1-vs-v2 mismatch on the unchecked fields wouldn't have shown up in review.My question
Is the
smollm2alias here intentionally meant to point at SmolLM v1 (in which case the naming is just historical and I should treat it that way), or is this an unnoticed wiring mistake where the alias was supposed to land on SmolLM2 but ended up on v1?