onnxruntime
7c782f67 - [webgpu] Always use tile matmulnbits for block_size = 32 (#23140)

Commit
1 year ago
[webgpu] Always use tile matmulnbits for block_size = 32 (#23140) ### Description After the optimization of prefill time with #23102, it seems that always using the tile matmulnibits with block_size = 32 can bring better performance even for discrete gpu for phi3 model. Phi3 becomes 42.64 tokens/sec from 32.82 tokens/sec in easy mode on my NV RTX 2000 GPU.
Author
Parents
Loading