Skip to content

from_single_file: CLIPTextModel has no attribute 'text_model' with transformers >= 5.6 #13833

@kappacommit

Description

@kappacommit

Describe the bug

from_single_file fails when loading a model whose CLIP text encoder is a CLIPTextModel (e.g. SD 1.x), when transformers >= 5.6 is installed.

In transformers 5.6, CLIPTextModel was flattened: its submodules (embeddings, encoder, final_layer_norm) are now assigned directly on the model and the text_model attribute was removed (CLIPTextModelWithProjection still has text_model, so SDXL-style encoders are unaffected). See huggingface/transformers#46285.

create_diffusers_clip_model_from_ldm in diffusers/loaders/single_file_utils.py reads model.text_model.embeddings.position_embedding.weight.shape[-1], which raises:

AttributeError: 'CLIPTextModel' object has no attribute 'text_model'

diffusers declares transformers>=4.41.2 with no upper bound, so this combination installs without warning.

Reproduction

import torch
from transformers import CLIPTextModel
from diffusers.loaders.single_file_utils import create_diffusers_clip_model_from_ldm

# Build an SD1.x-style LDM CLIP state dict: keys under "cond_stage_model.transformer.<hf-key>"
ref = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
checkpoint = {f"cond_stage_model.transformer.{k}": v for k, v in ref.state_dict().items()}

create_diffusers_clip_model_from_ldm(
    CLIPTextModel,
    checkpoint=checkpoint,
    config="openai/clip-vit-large-patch14",
    local_files_only=False,
)

This is the same code path used by StableDiffusionPipeline.from_single_file(<sd1.5 .safetensors>).

Logs

Traceback (most recent call last):
  File ".../diffusers/loaders/single_file_utils.py", line 1702, in create_diffusers_clip_model_from_ldm
    position_embedding_dim = model.text_model.embeddings.position_embedding.weight.shape[-1]
  File ".../torch/nn/modules/module.py", line 1940, in __getattr__
    raise AttributeError(...)
AttributeError: 'CLIPTextModel' object has no attribute 'text_model'

System Info

  • diffusers: 0.37.0 (also present on main / 0.38.0 — same line is unchanged)
  • transformers: reproduces on 5.6.0 – 5.9.0 (works on <= 5.5.x)
  • huggingface_hub: 1.17.0
  • torch: 2.7.1+cu128
  • accelerate: 1.8.1
  • Python: 3.12.9
  • Platform: Windows-11

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions