onnxruntime
abdf8b7c - [js/webgpu] Optimize broadcast binary. (#18185)

Commit
2 years ago
[js/webgpu] Optimize broadcast binary. (#18185) ### Description Currently, the binary algorithms are divided into the vectorize one (efficient) and non-vectorize one (less efficient). Below situations will go to the vectorize one: 1) A or B's shape length is 1. 2) The shared dimensions length of A and B are divisible by 4. 3) A and B have same shape. This PR adds another situation as below to go to the vectorize algorithm. 4. A or B's last dimension is divisible by 4. With this change, the aggerate time of Add in sam-b-encoder becomes 309.65 ms from 409.12 ms on Intel ADL.
Author
Parents
Loading