pytorch
86c76525 - [inductor] layout optimization for conv (#99773)

Commit
1 year ago
[inductor] layout optimization for conv (#99773) convolution kernel with channels last runs much faster then kernel with contiguous inputs. The PR leverage that to optimize tensor layouts so we provide 'channels last' inputs to convolution. Some care need to be taken to not convert tensor layout between contiguous and channels last back and forth. Those extra copies hurt performance quite much. Latest perf number [here](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2024%20May%202023%2023%3A40%3A37%20GMT&stopTime=Wed%2C%2031%20May%202023%2023%3A40%3A37%20GMT&granularity=hour&suite=torchbench&mode=training&dtype=amp&lBranch=shunting-layout-opt-19&lCommit=baa797fc100688dfb044fbcbdebcfd2591710f78&rBranch=main&rCommit=999bae0f54108ffc5b7cf2524a02a83901554b16) - TB: 1.64x -> 1.69x - HF: 1.79x -> 1.78x (random noise) - TIMM: 1.51x -> 1.65x Right now we disable layout optimization for dynamic shape since there is perf loss in that combination. Here is a GH issue to followup: https://github.com/pytorch/pytorch/issues/102670 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99773 Approved by: https://github.com/jansel
Author
Committer
Parents
Loading