onnxruntime
6d28e2d2 - [webgpu] support smooth softmax for non-FA GQA implementation (#25285)

Commit
300 days ago
[webgpu] support smooth softmax for non-FA GQA implementation (#25285) ### Description support smooth softmax for non-FA GQA implementation This change depends on: - #25269 Work items: - [x] support smooth softmax - [x] support bias - [x] support head sink (per-head smooth softmax) The following will not be included in this PR: - support for FlashAttention - support sliding window
Author
Parents
Loading