onnxruntime
3dfc2ae3 - [webgpu] Optimize MatMulNBits for f16 Block32 prefill performance (#23908)

Commit

1 year ago

[webgpu] Optimize MatMulNBits for f16 Block32 prefill performance (#23908) ### Description This commit improve the MatMulNBits f16 Block32 prefill performance, by increasing tiling size and enhancing memory efficiency. Achieved a +2x performance boost on Intel iGPUs for Phi-3.5-mini f16 model. ### Motivation and Context See above.

References

#23908 - [webgpu] Optimize MatMulNBits for f16 Block32 prefill performance

Author

daijh

Parents

e5e906ee

onnxruntime 3dfc2ae3 - [webgpu] Optimize MatMulNBits for f16 Block32 prefill performance (#23908)

onnxruntime
3dfc2ae3 - [webgpu] Optimize MatMulNBits for f16 Block32 prefill performance (#23908)