[ROCm] Add Torch SDPA and xFormers optimization for FireRedASR #102

sammysun0711 · 2025-10-29T08:32:34Z

Hi FireRedTeam, thanks for your great work!

This PR aims to add FireRedASR optimization on ROCm on target platform AMD Instinct MI300+ GPU.

Add docker/Dockerfile.rocm to quickly setup ROCm7 environment for deployment
Add Pytorch SDPA and xFormer Attention support for performance optimization, can be controlled by environment variable: ATTENTION_BACKEND="SDPA" and ATTENTION_BACKEND="XFORMERS"
Fix torch.load issue with weight_only=False for torch >= 2.6
Add benchmark scripts and torch profiling support for performance analysis of different attention backend.
Add FireRedASR optimization on ROCm guide in README.md.

Here are performance results with example audio (batch size=1) on single MI308X for your reference:

ATTENTION_BACKEND	RTF	Performance gain vs Native
Native	0.063	/
Torch SDPA	0.048	23.81%
xFormers Attention	0.056	11.11%

Signed-off-by: Xiake Sun <xiake.sun@amd.com>

… and xFormers attention

…h size > 1 case

…ariable length

…ength data

kaituoxu · 2025-11-24T05:17:23Z

Thanks for your PR, we will review.

sammysun0711 added 13 commits October 29, 2025 15:54

Add FireRedASR optimization on ROCm

974386f

Signed-off-by: Xiake Sun <xiake.sun@amd.com>

Minor update

c367305

Add attention mask support in batch size > 1 condition for Torch SDPA…

d6de6ba

… and xFormers attention

Refactor xFormers bias_attn handeling with BlockDiagonalMask for batc…

e210318

…h size > 1 case

Fix dtype

4dbda0b

Fix attn_bias creation in cross attention mask for audio input with v…

aa3f757

…ariable length

Refactor xFormers backend

7efe26f

Update benchmark script to load audio from directory

2d6af13

Refactor benchmark run benchmark with audio directory with variable l…

b9bf68e

…ength data

Algin beam search parameters, set model default precision as FP16

e91314d

Bugfix for xFormers attention backend

f63eeb4

Reuse cached encoder kv proj for cross-attention in decoding phase

56a7397

Simpify xformer backend, remove unused code

cb0a06a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Add Torch SDPA and xFormers optimization for FireRedASR #102

[ROCm] Add Torch SDPA and xFormers optimization for FireRedASR #102

Uh oh!

sammysun0711 commented Oct 29, 2025

Uh oh!

kaituoxu commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ROCm] Add Torch SDPA and xFormers optimization for FireRedASR #102

Are you sure you want to change the base?

[ROCm] Add Torch SDPA and xFormers optimization for FireRedASR #102

Uh oh!

Conversation

sammysun0711 commented Oct 29, 2025

Uh oh!

kaituoxu commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants