Skip to content

Adds NVIDIA PixelDiT and PiD support#1393

Merged
mcmonkey4eva merged 15 commits into
mcmonkeyprojects:masterfrom
jtreminio:pixeldit-pid-support
Jun 5, 2026
Merged

Adds NVIDIA PixelDiT and PiD support#1393
mcmonkey4eva merged 15 commits into
mcmonkeyprojects:masterfrom
jtreminio:pixeldit-pid-support

Conversation

@jtreminio
Copy link
Copy Markdown
Contributor

@jtreminio jtreminio commented May 26, 2026

Depends on Comfy-Org/ComfyUI#14103

Not included: docs updates.

PixelDiT is an image model. Not that great.

The interesting part of this PR is the PiD, a 4x-locked upscaler that now replaces the refiner stage's upscaler. The upscale happens after the refiner's SwarmKSampler node.

PixelDiT workflow:
CleanShot 2026-05-26 at 12 30 51

PiD upscale workflow:
CleanShot 2026-05-27 at 18 28 36

@jtreminio jtreminio marked this pull request as ready for review May 27, 2026 23:29
@jtreminio
Copy link
Copy Markdown
Contributor Author

1839001-A high-quality, cinematic portrait featu

Comment thread docs/Model Support.md Outdated
# PixelDiT

- NVIDIA's [PixelDiT](<https://huggingface.co/Comfy-Org/PixelDiT>) is supported in SwarmUI!
- Or the smaller FP8 version: [Comfy-Org/PixelDiT - mxfp8](<https://huggingface.co/Comfy-Org/PixelDiT/resolve/main/diffusion_models/pixeldit_1300m_1024px_mxfp8.safetensors>)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonktext

if (doUpscale && upscaleMethod.StartsWith("pidmodel-"))
{
string pidModelName = upscaleMethod.After("pidmodel-");
T2IModel pidModel = Program.MainSDModels.GetModel(pidModelName);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check t2iprompthandling for "lora", there's a weird special case pattern for how indirectly specified models are read that accommodates both white/blacklisting of models and user-typing issues (eg excluding the .safetensors or not)

string pidSampled = g.CreateKSampler(g.CurrentModel.Path, [pidCond, 0], pidNeg, [pidEmptyLatent, 0], pidCfg, pidSteps, 0, 10000,
g.UserInput.Get(T2IParamTypes.Seed) + 2, false, true, defsampler: "lcm", defscheduler: "simple", explicitSampler: pidSampler, explicitScheduler: pidScheduler, sectionId: T2IParamInput.SectionID_PixelDecoder);
g.CurrentMedia = g.CurrentMedia.WithPath([pidSampled, 0], WGNodeData.DT_LATENT_IMAGE, pidModel.ModelClass?.CompatClass);
g.CurrentMedia.Width = pidWidth;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the Refiner Upscale, since target size is user-specified, follow user specified size by way of doing a post-rescale in pixel space, see how ImageUpscaleWithModel does it above

Comment thread src/Text2Image/T2IModelClassSorter.cs Outdated
bool isHiDreamO1Lora(JObject h) => hasLoraKey(h, "final_layer2.linear") && hasLoraKey(h, "language_model.layers.0.self_attn.q_proj");
bool isChroma(JObject h) => h.ContainsKey("distilled_guidance_layer.in_proj.bias") && h.ContainsKey("double_blocks.0.img_attn.proj.bias");
bool isChromaRadiance(JObject h) => h.ContainsKey("nerf_image_embedder.embedder.0.bias");
bool isPiD(JObject h) => h.ContainsKey("net.lq_proj.latent_proj.0.weight");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you pick another key or two each just to narrow it? The list is getting long enough that we're getting occasional surprise overlaps.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added net.pixel_blocks.0.attn.q_norm.weight for isPid() and core.pixel_blocks.0.attn.q_norm.weight for isPixelDiT(). I figure keys with pixel_ in them aren't very common (yet). Clearing metadata is clean.

@jtreminio jtreminio marked this pull request as draft May 29, 2026 12:37
* as base model - for when a user uploads an image I guess
* as refiner model - if base model isn't a compatible vae user, load the vae and add a vae decode/encode pair
* refiner upscale model - base -> pid -> downscale or upscale with lanczos (if needed) -> refiner swarmksampler
* after the refiner swarmksampler; if refiner model isn't a compatible vae user, load the vae and add a vae decode/encode pair
@jtreminio jtreminio marked this pull request as ready for review June 4, 2026 17:46
Comment thread src/BuiltinExtensions/ComfyUIBackend/WorkflowGenerator.cs
public (WGNodeData, string) CreatePidCompatLatent(T2IModel pidModel, WGNodeData media, WGNodeData decodeVae)
{
string mediaFamily = media.IsLatentData ? media.Compat?.VaeFamily : null;
string family = PidFamilyOfModel(pidModel) ?? mediaFamily ?? "flux1";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if flux1 and flux2 both are fine, iirc flux2 is a much better latent format

@mcmonkey4eva mcmonkey4eva force-pushed the pixeldit-pid-support branch from c40732f to 5a796a3 Compare June 5, 2026 01:02
@mcmonkey4eva mcmonkey4eva merged commit be8ed96 into mcmonkeyprojects:master Jun 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants