pytorch
3cb7ec31 - [PT-Vulkan] aten::conv1d - opt: width-pack weight tensor (>2x speedup) (#118835)

Commit View On GitHub

Commit

228 days ago

[PT-Vulkan] aten::conv1d - opt: width-pack weight tensor (>2x speedup) (#118835) ## This diff This optimization reduces calls to `texelFetch(uKernel, ...)` by 4. We borrow MatMul's work to do the re-packing: https://www.internalfb.com/code/fbsource/[7e8ef1b8adeda224a736f8cc4bf870e0a659df95]/xplat/caffe2/aten/src/ATen/native/vulkan/ops/Mm.cpp?lines=20%2C50 ## Future optimziations We are already batching reads from input/weight tensors, and writes to output tensor. Here are other ideas, which I won't pursue for now. (2) is the most doable. 1. **Batch reads/writes along the dimension that is most commonly > 1.** For weights, the length dimension is definitely correct here, but input/outputs could potentially leverage the length dimensions too. However, `stride != 1` would complicate this optimization. 2. **Batch an optimal number of reads/writes.** Instead of default-ing to 4 elements (since that corresponds to 1 texel), consider more elements such as MatMul's 4x4 texel tile. 3. **Obscure shader compiler optimizations.** Since MatMul seemed to benefit from several seemingly equivalent ways to write code. Differential Revision: [D53204674](https://our.internmc.facebook.com/intern/diff/D53204674/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118835 Approved by: https://github.com/SS-JIA, https://github.com/liuk22

Author

jorgep31415

Committer

pytorchmergebot

Parents

2349e473

pytorch 3cb7ec31 - [PT-Vulkan] aten::conv1d - opt: width-pack weight tensor (>2x speedup) (#118835)

Commit

pytorch
3cb7ec31 - [PT-Vulkan] aten::conv1d - opt: width-pack weight tensor (>2x speedup) (#118835)