onnxruntime
8159723b - [js/webgpu] Optimize matmulnbits (#22360)

Commit

1 year ago

[js/webgpu] Optimize matmulnbits (#22360) ### Description  This PR further optimizes matmulnbits specially for iGPUs. The phi3 demo becomes ~12 tokens/second from ~8 tokens on iGPUs. Some todos: 1. Make the optimization more general, Remove the blockSize = 32 limitation. 2. Tune the parameter, such as workgroupSize, components size (currently only support components = 1), to see the performance change.

References

#22360 - [js/webgpu] Optimize matmulnbits

Author

qjia7

Parents

2bc37544

onnxruntime 8159723b - [js/webgpu] Optimize matmulnbits (#22360)

onnxruntime
8159723b - [js/webgpu] Optimize matmulnbits (#22360)