Skip to content

Conversation

@sammysun0711
Copy link

Hi FireRedTeam, thanks for your great work!

This PR aims to add FireRedASR optimization on ROCm on target platform AMD Instinct MI300+ GPU.

  • Add docker/Dockerfile.rocm to quickly setup ROCm7 environment for deployment
  • Add Pytorch SDPA and xFormer Attention support for performance optimization, can be controlled by environment variable: ATTENTION_BACKEND="SDPA" and ATTENTION_BACKEND="XFORMERS"
  • Fix torch.load issue with weight_only=False for torch >= 2.6
  • Add benchmark scripts and torch profiling support for performance analysis of different attention backend.
  • Add FireRedASR optimization on ROCm guide in README.md.

Here are performance results with example audio (batch size=1) on single MI308X for your reference:

ATTENTION_BACKEND RTF Performance gain vs Native
Native 0.063 /
Torch SDPA 0.048 23.81%
xFormers Attention 0.056 11.11%

@kaituoxu
Copy link
Collaborator

Thanks for your PR, we will review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants