from_single_file: CLIPTextModel has no attribute 'text_model' with transformers >= 5.6

### Describe the bug

`from_single_file` fails when loading a model whose CLIP text encoder is a `CLIPTextModel` (e.g. SD 1.x), when transformers >= 5.6 is installed.

In transformers 5.6, `CLIPTextModel` was flattened: its submodules (`embeddings`, `encoder`, `final_layer_norm`) are now assigned directly on the model and the `text_model` attribute was removed (`CLIPTextModelWithProjection` still has `text_model`, so SDXL-style encoders are unaffected). See huggingface/transformers#46285.

`create_diffusers_clip_model_from_ldm` in `diffusers/loaders/single_file_utils.py` reads `model.text_model.embeddings.position_embedding.weight.shape[-1]`, which raises:

```
AttributeError: 'CLIPTextModel' object has no attribute 'text_model'
```

`diffusers` declares `transformers>=4.41.2` with no upper bound, so this combination installs without warning.

### Reproduction

```python
import torch
from transformers import CLIPTextModel
from diffusers.loaders.single_file_utils import create_diffusers_clip_model_from_ldm

# Build an SD1.x-style LDM CLIP state dict: keys under "cond_stage_model.transformer.<hf-key>"
ref = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
checkpoint = {f"cond_stage_model.transformer.{k}": v for k, v in ref.state_dict().items()}

create_diffusers_clip_model_from_ldm(
    CLIPTextModel,
    checkpoint=checkpoint,
    config="openai/clip-vit-large-patch14",
    local_files_only=False,
)
```

This is the same code path used by `StableDiffusionPipeline.from_single_file(<sd1.5 .safetensors>)`.

### Logs

```shell
Traceback (most recent call last):
  File ".../diffusers/loaders/single_file_utils.py", line 1702, in create_diffusers_clip_model_from_ldm
    position_embedding_dim = model.text_model.embeddings.position_embedding.weight.shape[-1]
  File ".../torch/nn/modules/module.py", line 1940, in __getattr__
    raise AttributeError(...)
AttributeError: 'CLIPTextModel' object has no attribute 'text_model'
```

### System Info

- diffusers: 0.37.0 (also present on main / 0.38.0 — same line is unchanged)
- transformers: reproduces on 5.6.0 – 5.9.0 (works on <= 5.5.x)
- huggingface_hub: 1.17.0
- torch: 2.7.1+cu128
- accelerate: 1.8.1
- Python: 3.12.9
- Platform: Windows-11

### Who can help?

_No response_


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

from_single_file: CLIPTextModel has no attribute 'text_model' with transformers >= 5.6 #13833

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

from_single_file: CLIPTextModel has no attribute 'text_model' with transformers >= 5.6 #13833

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions