pytorch
68bc0fc0 - [inductor] a script to benchmark the perf impact from tensor layout (#99583)

Commit

1 year ago

[inductor] a script to benchmark the perf impact from tensor layout (#99583) Follow up on Jason's idea of tensor layout tuning. Add a script to show the perf impact of layout to convolution (will add more cases like batch/layer norm, reduction to the scripts). For convolution, a quick test shows using channels last layout, we get 1.4x speedup for convolution: ``` baseline 4.509183883666992 test 3.178528070449829 speedup 1.419x ``` The speedup definitely also depends on input/weight shapes. E.g., change input channel from 3 in the test to 8, we see speedup to be 2.1x The trace shows cudnn calls different kernels when input layout changes to channels last. <img width="997" alt="Screenshot 2023-04-19 at 5 27 54 PM" src="https://user-images.githubusercontent.com/52589240/233228656-4bdcac0a-7633-416a-82e1-17d8dc8ea9a6.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99583 Approved by: https://github.com/jansel

Author

shunting314

Committer

pytorchmergebot

Parents

da322ea8

pytorch 68bc0fc0 - [inductor] a script to benchmark the perf impact from tensor layout (#99583)

pytorch
68bc0fc0 - [inductor] a script to benchmark the perf impact from tensor layout (#99583)