cache enc kv proj for cross-attention #106

tingqli · 2025-11-15T09:01:23Z

The kv-projection in cross-attention is calculated in every decoding step which is redundant since encoder_outputs doesn't change during whole decoding phase, this PR add a simple caching mechanism in cross-attn to avoid recomputing. in my test case (batch-size=32 beam-size=3 audio-length=20s), the e2e latency reduced from 20seconds to 6.1 seconds on H20.

kaituoxu · 2025-11-24T05:17:13Z

Thanks for your PR, we will review.

cache enc kv proj for cross-attention

676b411

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cache enc kv proj for cross-attention #106

cache enc kv proj for cross-attention #106

Uh oh!

tingqli commented Nov 15, 2025

Uh oh!

kaituoxu commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cache enc kv proj for cross-attention #106

Are you sure you want to change the base?

cache enc kv proj for cross-attention #106

Uh oh!

Conversation

tingqli commented Nov 15, 2025

Uh oh!

kaituoxu commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants