fix: flash_attn_3_func value unpacking in _wrapped_flash_attn_3 w th compile #12851

badayvedat · 2025-12-17T00:27:58Z

What does this PR do?

The _wrapped_flash_attn_3 function unconditionally unpacks both out and lse from the return value:

out, lse, *_ = flash_attn_3_func(...)

However, it was not passing return_attn_probs=True to request the tuple return. Since Dao-AILab/flash-attention@203b9b3, flash_attn_func returns only out by default, causing:

 File "/root/flash-attention/diffusers/.venv/lib/python3.12/site-packages/torch/_compile.py", line 51, in inner
    return disable_fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/flash-attention/diffusers/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/flash-attention/diffusers/.venv/lib/python3.12/site-packages/torch/_library/custom_ops.py", line 367, in wrapped_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/flash-attention/diffusers/src/diffusers/models/attention_dispatch.py", line 643, in _wrapped_flash_attn_3
    out, lse, *_ = flash_attn_3_func(
    ^^^^^^^^^^^^
ValueError: not enough values to unpack (expected at least 2, got 1)

How does this pr fixes it

Adds return_attn_probs=True to the flash_attn_3_func call, consistent with how _flash_attention_3_hub handles.

Reproduction

# requirements.txt
git+github.com/huggingface/diffusers@5e48f466b9c0d257f2650e8feec378a0022f2402"
torch==2.7.1
transformers
accelerate
--extra-index-url=https://download.pytorch.org/whl/cu128

and bring your own flash attention build, to repro this i built it from source @ Dao-AILab/flash-attention@ac9b5f1

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
).to("cuda")

pipe.transformer.set_attention_backend("_flash_3")

# ValueError: not enough values to unpack (expected at least 2, got 1)
pipe("a photo of a cat", num_inference_steps=1)

Alternative

The wrapper seems to exist to support fa3 as custom op. However, fa3 now has native torch.compile support as of Dao-AILab/flash-attention@c7697bb. This might be making _wrapped_flash_attn_3 redundant, tho i dont really know if that is the only reason.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

General functionalities: @sayakpaul @yiyixuxu @DN6

sayakpaul · 2025-12-17T03:40:56Z

Thanks for your PR. Since torch.compile support has been merged, would you be interested in refactoring and cleaning up the FA3 backend in attention_dispatch.py?

fix: flash_attn_3_func return value unpacking in _wrapped_flash_attn_3

abe7d24

badayvedat changed the title ~~fix: flash_attn_3_func return value unpacking in _wrapped_flash_attn_3 w torch compile~~ fix: flash_attn_3_func value unpacking in _wrapped_flash_attn_3 w th compile Dec 17, 2025

badayvedat marked this pull request as ready for review December 17, 2025 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: flash_attn_3_func value unpacking in _wrapped_flash_attn_3 w th compile #12851

fix: flash_attn_3_func value unpacking in _wrapped_flash_attn_3 w th compile #12851

badayvedat commented Dec 17, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: flash_attn_3_func value unpacking in _wrapped_flash_attn_3 w th compile #12851

Are you sure you want to change the base?

fix: flash_attn_3_func value unpacking in _wrapped_flash_attn_3 w th compile #12851

Conversation

badayvedat commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How does this pr fixes it

Reproduction

Alternative

Before submitting

Who can review?

Uh oh!

sayakpaul commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

badayvedat commented Dec 17, 2025 •

edited

Loading