onnxruntime
25f42746 - [js/webgpu] Optimize ConvTranspose (Continue) (#23429)

Commit

1 year ago

[js/webgpu] Optimize ConvTranspose (Continue) (#23429) BUG #23273 This PR does below optimizations: 1. When output channels is one, 1) calculate the offset before the inchannel loop to reduce indices to offsets calculation, 2) split the `inputChannelsPerGroup` into `inputChannelsPerGroupInt` and `inputChannelsRemainder` parts so that we can always access 4 data for `inputChannelsPerGroupInt`. 2. Use precise initial value to reduce useless loop iterations. Thanks @jiangzhaoming 's suggestion's on this. With this PR, ConvTranspose becomes 3.7s from 8.4s on Intel Meteor Lake. On NV RTX 2000 Ada, it becomes 1.6s from 2.7s.

References

#23429 - [js/webgpu] Optimize ConvTranspose (Continue)

Author

qjia7

Parents

ff8465ed

onnxruntime 25f42746 - [js/webgpu] Optimize ConvTranspose (Continue) (#23429)

onnxruntime
25f42746 - [js/webgpu] Optimize ConvTranspose (Continue) (#23429)