onnxruntime
8159723b - [js/webgpu] Optimize matmulnbits (#22360)

Commit
1 year ago
[js/webgpu] Optimize matmulnbits (#22360) ### Description <!-- Describe your changes. --> This PR further optimizes matmulnbits specially for iGPUs. The phi3 demo becomes ~12 tokens/second from ~8 tokens on iGPUs. Some todos: 1. Make the optimization more general, Remove the blockSize = 32 limitation. 2. Tune the parameter, such as workgroupSize, components size (currently only support components = 1), to see the performance change.
Author
Parents
Loading