onnxruntime
938b6075 - Optimize FlashAttention for M4 Max (20x speedup) (#27780)

Commit
4 days ago
Optimize FlashAttention for M4 Max (20x speedup) (#27780) MultiHeadAttention Before: 58.3s After: 2.89 Speedup: 20x ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Tested with vision_encoder.onnx for https://huggingface.co/onnx-community/LightOnOCR-2-1B-ONNX
Author
Parents
Loading