onnxruntime
3dfc2ae3 - [webgpu] Optimize MatMulNBits for f16 Block32 prefill performance (#23908)

Commit
319 days ago
[webgpu] Optimize MatMulNBits for f16 Block32 prefill performance (#23908) ### Description This commit improve the MatMulNBits f16 Block32 prefill performance, by increasing tiling size and enhancing memory efficiency. Achieved a +2x performance boost on Intel iGPUs for Phi-3.5-mini f16 model. ### Motivation and Context See above.
Author
Parents
Loading