Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion #12852

miguelmartin75 · 2025-12-17T00:32:26Z

What does this PR do?

This PR adds Cosmos Predict2.5 Base. It has been tested using the 2B model checkpoint official HF checkpoint is here. The converted checkpoints have yet to be uploaded to HF.

This change is largely based off the previous predict1/predict2 support done by @a-r-r-o-w

Testing:

Aside from the unit tests, the examples in this README from cosmos predict2.5 have been checked against, and the inference pipeline produces similar outputs.
These are included in the docstring for the pipeline

Additions

Pipelines:

A base pipeline which handles all modes for Predict2.5 Base checkpoint: Text2World, Image2World, Video2World
- This pipeline loads the Reason1 checkpoint via Qwen2_5_VLForConditionalGeneration
- In fact this pipeline supports image output if providing num_frames=1
- Limitations: the pipeline assumes batch_size == 1
Three derivative pipelines are made based on this pipeline, to handle the official modes: Text2World, Image2World, Video2World
Unit tests are added, based on tests present for cosmos predict2

Scheduler:

FlowUniPCMultistepScheduler: is a new scheduler introduced, which uses the EDM noise schedule (Karras sigmas) using the UniPC algorithm as predict2.5 uses flow matching. This name can be changed.
This is done to match the predict2.5 codebase
Unit tests are added covering basics. Note, the UniPC algorithm is not unit tested.

Model changes:

Modified CosmosTransformer to accept an optional cross-attention projection layer (used for text embeddings from Reason1)

Scripts:

Extended scripts/convert_cosmos_to_diffusers.py to support Predict2.5

Who can review?

@yiyixuxu - pipeline and scheduler
@sayakpaul - Documentation review
@a-r-r-o-w

- New scheduler: scheduling_flow_unipc_multistep.py - Changes to TransformerCosmos for text embeddings via crossattn_proj

miguelmartin75 · 2025-12-17T00:48:19Z

tests/pipelines/cosmos/test_cosmos2_5_predict.py

+        scheduler = FlowUniPCMultistepScheduler()
+
+        # NOTE: using Qwen2 VL instead for tests (reason1 is based on 2.5)
+        text_encoder = Qwen2VLForConditionalGeneration.from_pretrained(


Is there an internal Qwen2_5_VL model to test with?

Would something like this work?

diffusers/tests/pipelines/qwenimage/test_qwenimage.py

Line 116 in 5e48f46

text_encoder = Qwen2_5_VLForConditionalGeneration(config)

sayakpaul

Thanks for the PR!

I left some comments. My major comments are on separating the pipelines from one another instead of inheriting from one another.

Let's also add docs?

sayakpaul · 2025-12-17T03:51:30Z

src/diffusers/models/transformers/transformer_cosmos.py

            hidden_size, patch_size[0] * patch_size[1] * patch_size[2] * out_channels, bias=False
        )

+        self.use_crossattn_projection = use_crossattn_projection


We don't have to assign the variables like this because register_to_config will have them included in the model.config:

diffusers/src/diffusers/models/transformers/transformer_cosmos.py

Line 426 in 5e48f46

@register_to_config

Suggested change

self.use_crossattn_projection = use_crossattn_projection

sayakpaul · 2025-12-17T03:51:40Z

src/diffusers/models/transformers/transformer_cosmos.py

        else:
            assert False

+        if self.use_crossattn_projection:


Suggested change

if self.use_crossattn_projection:

if self.config.use_crossattn_projection:

sayakpaul · 2025-12-17T03:53:16Z

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

+"""
+
+
+class Cosmos2_5_PredictBase(DiffusionPipeline):


Suggested change

class Cosmos2_5_PredictBase(DiffusionPipeline):

class Cosmos2_5_PredictBasePipeline(DiffusionPipeline):

And elsewhere

sayakpaul · 2025-12-17T03:55:44Z

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

+
+        return prompt_embeds, negative_prompt_embeds
+
+    def prepare_latents(


Just checking if this is similar to any of the other existing Cosmos pipelines? If so, maybe this can be supplemented with a `"# Copied from ..." comment?

sayakpaul · 2025-12-17T03:57:33Z

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

+        self._current_timestep = None
+
+        if not output_type == "latent":
+            assert self.latents_mean is not None and self.latents_std is not None, (


We avoid assert statements like this in the modules. If we really want to raise, let's raise Errors instead.

sayakpaul · 2025-12-17T03:59:25Z

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

+            video = self.vae.decode(latents.to(self.vae.dtype), return_dict=False)[0]
+
+            assert self.safety_checker is not None
+            self.safety_checker.to(device)


We don't have to do it in this PR, but we could have a little utility like run_safety_checker() inside the pipelines and copy it over all the cosmos pipelines that require it (much akin to encode_prompt(), for example).

But this is not merge-blocking.

sayakpaul · 2025-12-17T04:01:01Z

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

+        return CosmosPipelineOutput(frames=video)
+
+
+class Cosmos2_5_PredictText2World(Cosmos2_5_PredictBase):


This feels quite odd. We usually maintain one pipeline class per pipeline file. That's been followed in the existing cosmos implementations as well:
https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/cosmos

We don't inherit from pipelines like Cosmos2_5_PredictText2World(Cosmos2_5_PredictBase):, either. We, instead, use"# Copied from ..." (we will unbloat quite a bit of that in #12820 but that's a separate thing).

Could we follow the same structure as Cosmos2 with separate files for each pipeline.
https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/cosmos

I will remove these pipelines

sayakpaul · 2025-12-17T04:04:20Z

tests/pipelines/cosmos/test_cosmos2_5_predict.py

+        scheduler = FlowUniPCMultistepScheduler()
+
+        # NOTE: using Qwen2 VL instead for tests (reason1 is based on 2.5)
+        text_encoder = Qwen2VLForConditionalGeneration.from_pretrained(


Would something like this work?

diffusers/tests/pipelines/qwenimage/test_qwenimage.py

Line 116 in 5e48f46

text_encoder = Qwen2_5_VLForConditionalGeneration(config)

HuggingFaceDocBuilderDev · 2025-12-17T04:14:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

Thanks for the PR! I left some feedback. Mainly:

Can we look into supporting the scheduler from the existing UniPCMultistepScheduler with the flow matching options (use_flow_sigmas, prediction_type="flow_prediction")? I'm ok adding this if it requires a lot of changes or just doesn't make sense to use the existing one, but wanted to check first.
for the Pipeline, can we combine the 3 pipelines into one Cosmos2_5PredictPipeline that inherits directly from DiffusionPipeline? The current design isn't how we typically structure pipelines in diffusers. This isn't ideal since our pipelines are normally task-based (text2image, image2video, etc.), but i have to admit it's getting increasingly difficult to keep that pattern without huge portion of duplicated code. I think a single unified pipeline is reasonable here.

yiyixuxu · 2025-12-17T07:19:07Z

src/diffusers/schedulers/scheduling_flow_unipc_multistep.py

+    return sigmas, timesteps
+
+
+class FlowUniPCMultistepScheduler(SchedulerMixin, ConfigMixin):


the existing unipc multi also support flow matching, https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_unipc_multistep.py#L212
do you think we will be able to reuse it?

This class doesn't output the sigma values needed when setting use_flow_sigmas=True, basically I want is use_karras_sigmas=True and use_flow_sigmas=True to work at the same time (with arguments for sigma_max and sigma_min).

After reading the code in unipc multi, I have modified the class to support the above. I've updated the PR with this change (it's ~10 LOC) - see the change here. Should I remove the flowunipc scheduler and keep this small change with an explicit test case for this setting?

Here is the outputs from a script for 35 steps (with the change to the branch). Without the change, setting both to True is equivalent to use_karras_sigmas=True, as this branch is hit first in the if/elif block

FlowUniPCMultistepScheduler: Timesteps (len=35) tensor([995, 994, 993, 991, 990, 988, 986, 983, 980, 976, 971, 965, 957, 948, 936, 920, 901, 877, 847, 809, 763, 708, 642, 568, 486, 402, 319, 243, 176, 122, 81, 51, 31, 18, 9]) Sigmas: (len=36) tensor([0.9950, 0.9942, 0.9932, 0.9920, 0.9905, 0.9887, 0.9865, 0.9839, 0.9806, 0.9766, 0.9717, 0.9655, 0.9578, 0.9482, 0.9360, 0.9208, 0.9016, 0.8775, 0.8473, 0.8098, 0.7637, 0.7081, 0.6427, 0.5683, 0.4870, 0.4025, 0.3195, 0.2430, 0.1768, 0.1229, 0.0817, 0.0519, 0.0315, 0.0181, 0.0099, 0.0000]) UniPCMultistepScheduler(**{'use_flow_sigmas': True}): Timesteps (len=35) tensor([999, 970, 941, 913, 884, 856, 827, 799, 770, 742, 713, 685, 656, 627, 599, 570, 542, 513, 485, 456, 428, 399, 371, 342, 313, 285, 256, 228, 199, 171, 142, 114, 85, 57, 28]) Sigmas: (len=36) tensor([0.9990, 0.9705, 0.9419, 0.9134, 0.8848, 0.8563, 0.8277, 0.7992, 0.7707, 0.7421, 0.7136, 0.6850, 0.6565, 0.6279, 0.5994, 0.5709, 0.5423, 0.5138, 0.4852, 0.4567, 0.4281, 0.3996, 0.3711, 0.3425, 0.3140, 0.2854, 0.2569, 0.2283, 0.1998, 0.1713, 0.1427, 0.1142, 0.0856, 0.0571, 0.0285, 0.0000]) torch.allclose?: False UniPCMultistepScheduler(**{'use_karras_sigmas': True}): Timesteps (len=35) tensor([999, 999, 991, 975, 957, 939, 920, 900, 880, 858, 835, 810, 784, 757, 727, 695, 661, 623, 583, 538, 488, 434, 375, 312, 248, 188, 136, 94, 62, 39, 23, 12, 6, 2, 0]) Sigmas: (len=36) tensor([2.0000e+02, 1.7084e+02, 1.4539e+02, 1.2327e+02, 1.0410e+02, 8.7544e+01, 7.3297e+01, 6.1086e+01, 5.0662e+01, 4.1801e+01, 3.4303e+01, 2.7989e+01, 2.2698e+01, 1.8289e+01, 1.4636e+01, 1.1626e+01, 9.1637e+00, 7.1625e+00, 5.5482e+00, 4.2564e+00, 3.2315e+00, 2.4258e+00, 1.7988e+00, 1.3162e+00, 9.4913e-01, 6.7355e-01, 4.6961e-01, 3.2106e-01, 2.1476e-01, 1.4017e-01, 8.8994e-02, 5.4745e-02, 3.2478e-02, 1.8473e-02, 1.0000e-02, 0.0000e+00]) torch.allclose?: False UniPCMultistepScheduler(**{'use_karras_sigmas': True, 'use_flow_sigmas': True}): Timesteps (len=35) tensor([995, 994, 993, 991, 990, 988, 986, 983, 980, 976, 971, 965, 957, 948, 936, 920, 901, 877, 847, 809, 763, 708, 642, 568, 486, 402, 319, 243, 176, 122, 81, 51, 31, 18, 9]) Sigmas: (len=36) tensor([0.9950, 0.9942, 0.9932, 0.9920, 0.9905, 0.9887, 0.9865, 0.9839, 0.9806, 0.9766, 0.9717, 0.9655, 0.9578, 0.9482, 0.9360, 0.9208, 0.9016, 0.8775, 0.8473, 0.8098, 0.7637, 0.7081, 0.6427, 0.5683, 0.4870, 0.4025, 0.3195, 0.2430, 0.1768, 0.1229, 0.0817, 0.0519, 0.0315, 0.0181, 0.0099, 0.0000]) torch.allclose?: True

Note I have a bug in FlowUniPCMultistepScheduler as it outputs len=37 for sigmas when I set num_inference_steps=35, so to compare, I set set_timesteps(num_inference_steps -1) to compare

For additional context:

I created a new scheduler this out as the UniPC class was forked in the predic2.5 codebase (which I refactored). I did look into trying to re-use unipc, but due to the above settings and the complexity of the class, I decided to just clean-up this fork. In my opinion, the UniPC class is quite complex due to:

It is initialized with VP-type noise schedule, but this schedule is not enforced for inference: https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_unipc_multistep.py#L251-L255

e.g. use_flow_sigmas=True or use_karras_sigmas=True this constraint is not enforced, i.e. use_flow_sigmas=True is only applied for set_timesteps which is for inference

betas and other parameters are not needed

KDM noise schedule, which can be achieved via use_karras_sigmas=True - but this is in conflict with use_flow_sigmas=True

Many parameters

In my opinion, a simpler design for this class could be composition to get sigmas that UniPC can operate on. The class does use some composition already, e.g. the argument solver_p overrides the predictor step, but it doesn't do so to get the sigmas/timesteps to perform unipc on. But I do understand the philosophy behind diffusers, copying the code is preferred.

yiyixuxu · 2025-12-17T07:33:12Z

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_predict.py

+        )
+
+
+class Cosmos2_5_PredictImage2World(Cosmos2_5_PredictBase):


all our pipelines should inherit from DIffusionPipeline
so In the future, we could consider supporting cosmos from Modular Diffusers https://huggingface.co/docs/diffusers/main/en/modular_diffusers/overview

for this PR, to get it in ASAP, I think the best option is to keep single Cosmos2_5_PredictPipeline that handles all 3 use cases

I will remove these three specialized pipelines, the base pipeline is more flexible and relevant default arguments for each mode in 2World

miguelmartin75 added 10 commits December 16, 2025 23:56

cosmos predict2.5 base: convert chkpt & pipeline

f6519ec

- New scheduler: scheduling_flow_unipc_multistep.py - Changes to TransformerCosmos for text embeddings via crossattn_proj

scheduler cleanup

c5aaa51

simplify inference pipeline

df085fa

cleanup scheduler + tests

a17c98d

Basic tests for flow unipc

53e425d

working b2b inference

032a56b

Rename everything

bd8d6f2

Tests for pipeline present, but not working (predict2 also not working)

425bd84

docstring update

62c682e

wrapper pipelines + make style

9ab1487

miguelmartin75 changed the title ~~Cosmos/predict2.5 base pr ready~~ Cosmos Predict2.5 Base Model Dec 17, 2025

miguelmartin75 changed the title ~~Cosmos Predict2.5 Base Model~~ Cosmos Predict2.5 Base: inference pipeline, checkpoint conversion & scheduler Dec 17, 2025

miguelmartin75 changed the title ~~Cosmos Predict2.5 Base: inference pipeline, checkpoint conversion & scheduler~~ Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion Dec 17, 2025

remove unnecessary files

7e0ee06

miguelmartin75 commented Dec 17, 2025

View reviewed changes

sayakpaul reviewed Dec 17, 2025

View reviewed changes

sayakpaul requested review from DN6 and yiyixuxu December 17, 2025 04:06

yiyixuxu reviewed Dec 17, 2025

View reviewed changes

UniPCMultistep: support use_karras_sigmas=True and use_flow_sigmas=True

94d2b41

miguelmartin75 force-pushed the cosmos/predict2.5-base-pr-ready branch from d969b9a to 94d2b41 Compare December 17, 2025 20:17

	if self.use_crossattn_projection:
	if self.config.use_crossattn_projection:

	class Cosmos2_5_PredictBase(DiffusionPipeline):
	class Cosmos2_5_PredictBasePipeline(DiffusionPipeline):


		return prompt_embeds, negative_prompt_embeds

		def prepare_latents(

		return CosmosPipelineOutput(frames=video)


		class Cosmos2_5_PredictText2World(Cosmos2_5_PredictBase):

		return sigmas, timesteps


		class FlowUniPCMultistepScheduler(SchedulerMixin, ConfigMixin):

Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion #12852

Are you sure you want to change the base?

Cosmos Predict2.5 Base: inference pipeline, scheduler & chkpt conversion #12852

Conversation

miguelmartin75 commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Additions

Who can review?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Dec 17, 2025

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miguelmartin75 Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

miguelmartin75 commented Dec 17, 2025 •

edited

Loading

miguelmartin75 Dec 17, 2025 •

edited

Loading