Skip to content

Improve trust_remote_code#13448

Open
hlky wants to merge 4 commits intohuggingface:mainfrom
hlky:trust-remote-code
Open

Improve trust_remote_code#13448
hlky wants to merge 4 commits intohuggingface:mainfrom
hlky:trust-remote-code

Conversation

@hlky
Copy link
Copy Markdown
Contributor

@hlky hlky commented Apr 12, 2026

What does this PR do?

As per #13446 trust_remote_code fails under several circumstances:

  • pretrained_model_name_or_path as Hub repo A and custom_pipeline as Hub repo B, trust_remote_code is bypassed and remote code runs from repo B
from diffusers import DiffusionPipeline

DiffusionPipeline.from_pretrained(
    "google/ddpm-cifar10-32", custom_pipeline="XManFromXlab/diffuser-custom-pipeline", trust_remote_code=False
)
  • pretrained_model_name_or_path as local directory and custom_pipeline as Hub repo B, trust_remote_code is never checked and remote code runs from repo B
from diffusers import DiffusionPipeline
from huggingface_hub import snapshot_download

snapshot_path = snapshot_download(repo_id="google/ddpm-cifar10-32")
DiffusionPipeline.from_pretrained(
    snapshot_path, custom_pipeline="XManFromXlab/diffuser-custom-pipeline", trust_remote_code=False
)
  • pretrained_model_name_or_path as local directory with custom components, trust_remote_code is never checked
from diffusers import DiffusionPipeline
from huggingface_hub import snapshot_download

snapshot_path = snapshot_download(repo_id="hf-internal-testing/tiny-sdxl-custom-components")
pipeline = DiffusionPipeline.from_pretrained(
    snapshot_path, trust_remote_code=False
)
assert pipeline.config.unet == ("diffusers_modules.local.my_unet_model", "MyUNetModel")
assert pipeline.config.scheduler == ("diffusers_modules.local.my_scheduler", "MyScheduler")

This moves trust_remote_code checks into get_cached_module_file where the actual custom module loading takes place. trust_remote_code is passed from several code paths for complete coverage.

I've added 3 separate ValueError in get_cached_module_file to account for the different sources of remote code: local, git and hub. git path could be considered trusted as these are from https://huggingface.co/datasets/diffusers/community-pipelines-mirror, hub path covers "custom pipelines" and local path is reached for "custom components" (these are added to the allowed files in download so become local files).

With PR the above 3 cases are resolved:

The repository for XManFromXlab/diffuser-custom-pipeline contains custom code in pipeline.py which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/XManFromXlab/diffuser-custom-pipeline/blob/main/pipeline.py.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
The repository for XManFromXlab/diffuser-custom-pipeline contains custom code in pipeline.py which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/XManFromXlab/diffuser-custom-pipeline/blob/main/pipeline.py.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
ValueError: The directory C:\Users\user\.cache\huggingface\hub\models--hf-internal-testing--tiny-sdxl-custom-components\snapshots\ce2b9d3f819e7791f53053646ebe37d7e87d73d3\unet contains custom code in my_unet_model.py which must be executed to correctly load the model. You can inspect the file content at C:\Users\user\.cache\huggingface\hub\models--hf-internal-testing--tiny-sdxl-custom-components\snapshots\ce2b9d3f819e7791f53053646ebe37d7e87d73d3\unet\my_unet_model.py.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.

Fixes #13446

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@@ -1674,21 +1678,6 @@ def download(cls, pretrained_model_name, **kwargs) -> str | os.PathLike:
custom_class_name = config_dict["_class_name"][1]

load_pipe_from_hub = custom_pipeline is not None and f"{custom_pipeline}.py" in filenames
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note this remains to control repo_id:

repo_id=pretrained_model_name if load_pipe_from_hub else None,

def _get_custom_pipeline_class(
custom_pipeline,
repo_id=None,
hub_revision=None,
class_name=None,
cache_dir=None,
revision=None,
):
if custom_pipeline.endswith(".py"):
path = Path(custom_pipeline)
# decompose into folder & file
file_name = path.name
custom_pipeline = path.parent.absolute()
elif repo_id is not None:
file_name = f"{custom_pipeline}.py"
custom_pipeline = repo_id
else:
file_name = CUSTOM_PIPELINE_FILE_NAME

It helps distinguish between:
a) custom_pipeline is e.g. my_pipeline and that filename exists in pretrained_model_name's files
b) custom_pipeline is Hub repo (and pipeline.py is used)

Maybe could be renamed load_pipe_from_hub -> hub_contains_custom_pipeline?


class CustomPipelineTests(unittest.TestCase):
def test_load_custom_pipeline(self):
with self.assertRaises(ValueError):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we investigate the ValueError messaging as well (it should have something related to the use of trust_remote_code or not something else)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And on main, this should not have yielded a ValueError, right? That is how we know, for one instance, that it's broken.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On main:

pretrained custom_pipeline trust_remote_code?
hub/repoA my_pipeline
hub/repoA one_step_unet[1]
hub/repoA hub/repoB
any local directory any

PR:

pretrained custom_pipeline trust_remote_code?
hub/repoA my_pipeline
hub/repoA one_step_unet[1]
hub/repoA hub/repoB
any local directory any

[1] or any community pipeline name

This case is more implicit vs explicit consent, but on main there is potential for misuse by combining the "trusted" nature of community pipeline names and third party Hub repos.

A user may copy an example like:

pipeline = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="one_step_unet")

or

pipeline = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="pipeline_stable_diffusion_xl_controlnet_adapter_inpaint")

or

pipeline = DiffusionPipeline.from_pretrained("google/ddpm-cifar10-32", custom_pipeline="pipeline_stable_diffusion_x/_controlnet_adapter_inpaint")

The first two are harmless from diffusers/community-pipelines-mirror, the third is malicious with a user registered as pipeline_stable_diffusion_x with a repo name _controlnet_adapter_inpaint. There are many community pipelines so many potential username/repo name combinations that could easily be missed.

Considering that I think community pipeline names should remain trusted, WDYT? We can just remove this to do so.

if not trust_remote_code:
raise ValueError(
f"The community pipeline for {pretrained_model_name_or_path} contains custom code which must be executed to correctly "
f"load the model. You can inspect the repository content at https://hf.co/datasets/{COMMUNITY_PIPELINES_MIRROR_ID}/blob/main/{revision}/{pretrained_model_name_or_path}.py.\n"
f"Please pass the argument `trust_remote_code=True` to allow custom code to be run."
)

revision: str | None = None,
local_files_only: bool = False,
local_dir: str | None = None,
trust_remote_code: bool = False,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add the ValueError to the caller sites of get_cached_module_file instead? Because the function itself isn't specifically tied to custom pipelines, I think.

custom_class_name = config_dict["_class_name"][1]

load_pipe_from_hub = custom_pipeline is not None and f"{custom_pipeline}.py" in filenames
load_components_from_hub = len(custom_components) > 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this going?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See case 3 - this code is in download which is only reached when we use a Hub path with from_pretrained. It is replaced by the check in get_cached_module_file, specifically the local code path.

Consider this scenario:

hf download rotcasuoicilam/SuperCoolNewModel --local-dir rotcasuoicilam/SuperCoolNewModel
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("rotcasuoicilam/SuperCoolNewModel")

rotcasuoicilam/SuperCoolNewModel contains malicious custom components, user downloads the Hub repo assuming it is safe, Diffusers loads the custom components without the user's consent.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get it.

hf download rotcasuoicilam/SuperCoolNewModel --local-dir rotcasuoicilam/SuperCoolNewModel

is agnostic to DiffusionPipeline.from_pretrained(...).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. A malicious actor uploads a model with malicious custom components
  2. Either:
    2a. The user follows instructions that say to download the model first then run from the local path
    2b. The user chooses to download the model first out of personal preference
  3. DiffusionPipeline.from_pretrained(the_local_path)
  4. pwned

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test case for this scenario then.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c0e0731

On main:

FAILED tests/pipelines/test_pipelines.py::CustomPipelineTests::test_custom_components_from_local_dir - AssertionError: ValueError not raised

PR:

1 passed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more elaborate explanation for other reviewers (feel free to correct).

The critical branching point is:

if not os.path.isdir(pretrained_model_name_or_path):                                                                                     
    # ... calls cls.download() which had the trust_remote_code check                                                                     
else:                                                                                                                                    
    cached_folder = pretrained_model_name_or_path  

When you call from_pretrained("rotcasuoicilam/SuperCoolNewModel"):

  1. os.path.isdir("rotcasuoicilam/SuperCoolNewModel") is checked.
  2. If the user previously ran hf download ... --local-dir rotcasuoicilam/SuperCoolNewModel, that directory exists locally.
  3. So os.path.isdir() returns True, and the code takes the else branch at line 871 — it just sets cached_folder = pretrained_model_name_or_path directly.
  4. The download() method is never called.

The old trust_remote_code check for custom components lived inside download(). Since download() is skipped entirely when the path is a
local directory, the check never runs. The custom components in that local folder get loaded without any consent gate.

That's why the fix moves the trust_remote_code check into get_cached_module_file — that's where the actual import of custom .py files
happens, and it runs regardless of whether the code came through download() or the local else branch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well just to be clear, using the same local-dir as Hub repo is a possible trick to hide the attack as anyone who didn't pre-download wouldn't be affected but any local directory is affected, and the local directory could be from other sources like snapshot_download or git clone.

@sayakpaul sayakpaul requested a review from DN6 April 13, 2026 03:06
@github-actions github-actions bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 13, 2026
@sayakpaul
Copy link
Copy Markdown
Member

@bot /style

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Style bot fixed some files and pushed the changes.

@github-actions github-actions bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 13, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions github-actions bot added size/M PR with diff < 200 LOC and removed size/M PR with diff < 200 LOC labels Apr 13, 2026
@hlky
Copy link
Copy Markdown
Contributor Author

hlky commented Apr 13, 2026

I was curious how common custom components are on the Hub. The results are limited but I managed to scrape 9602 Hub repo paths from the model pages. 190 were gated, 4685 actually had model_index.json - the rest must be LoRA or mis-tagged. Out of those, 58 had custom components with a total of 97 custom components. It is unlikely any of these are malicious but it is still interesting that if loading from a local path any of these Hub repos would currently load custom code without requiring trust_remote_code=True.
has_module.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Insufficient trust check for custom_pipeline parameter of DiffusionPipeline.from_pretrained method

3 participants