onnxruntime
c5b48ae3 - [webgpu] Restore FP16 math in flash attention generation (#24994)

Commit
210 days ago
[webgpu] Restore FP16 math in flash attention generation (#24994) This PR restores FP16 math in flash attention generation shader. It follows the changes in #24953 to use scale to multiply Q first instead of calculating it after QK to avoid data overflow in FP16.
Author
Parents
Loading