onnxruntime
8de1639a - [webgpu] Enable DP4A MatMul generation path for Qualcomm (#24408)

Commit
280 days ago
[webgpu] Enable DP4A MatMul generation path for Qualcomm (#24408) With this PR, the generation speed for phi4 improves 2x on Qualcomm Adreno X1 GPU (11.1 tps -> 23.2 tps for simple inputs).
Author
Parents
Loading