onnxruntime
4adef01e - [CUDA] Update sm check for flash attention (#24584)

Commit
245 days ago
[CUDA] Update sm check for flash attention (#24584) ### Description Currently, flash attention is only enabled for sm8x and sm90. That means blackwell GPU will not use flash attention. This change is enable flash attention for sm > 90. Note that the flash attention implementation is not optimized for blackwell, but shall be able to run in blackwell GPU. Future works: * Integrate flash attn for hopper: https://github.com/Dao-AILab/flash-attention/tree/main/hopper * Integrate fmha for blackwell: https://github.com/NVIDIA/cutlass/tree/main/examples/77_blackwell_fmha * Update cudnn and cudnn frontend to latest version (so that we can use the cudnn flash attention for blackwell). ### Motivation and Context ORT GENAI is slow in RTX 5090
Author
Parents
Loading