onnxruntime
c5b48ae3 - [webgpu] Restore FP16 math in flash attention generation (#24994)

Commit

210 days ago

[webgpu] Restore FP16 math in flash attention generation (#24994) This PR restores FP16 math in flash attention generation shader. It follows the changes in #24953 to use scale to multiply Q first instead of calculating it after QK to avoid data overflow in FP16.

References

#24994 - [webgpu] Restore FP16 math in flash attention generation

Author

qjia7

Parents

a1217d51

onnxruntime c5b48ae3 - [webgpu] Restore FP16 math in flash attention generation (#24994)

onnxruntime
c5b48ae3 - [webgpu] Restore FP16 math in flash attention generation (#24994)