onnxruntime
8301eea3 - Attention CUDA BFloat16 Support (#25974)

Commit
127 days ago
Attention CUDA BFloat16 Support (#25974) ### Description Attention BFloat16 Support for CUDA - extends kernel implementations to accept BF16 input/output tensors. ### Motivation and Context We already have BFloat16 support for GQA (Group Query Attention), but not for regular Attention which many models require for inference (e.g. visual encoder of Gemma 3) due to FP32-like stability at lower memory/compute cost. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Author
Parents
Loading