onnxruntime
80d8931f - [webgpu] Use subgroup for matmulnbits (#23224)

Commit
349 days ago
[webgpu] Use subgroup for matmulnbits (#23224) ### Description This PR applies subgroup to implement matmulnbits when tile_m > 1 for intel devices. With this PR, prefill for 500 tokens prompt for phi3 becomes 3.5s from 8.5s on intel Meteor Lake.
Author
Parents
Loading