Add channels_last for all torchbench models and enable it with run.py (#1371)
Summary:
In order to standardize the performance evluation and increase coverage, added the `channels_last` for all torchbench models, and enable it with `run.py` entry for debug using. This PR as a first step to standardize and increase coverage for TorchBench, which works for the roadmap https://github.com/pytorch/benchmark/issues/1293.
Took `alexnet` as an example, which run on CLX 8280L (28cc)
```shell
python run.py alexnet -d cpu -m eager -t eval
Running eval method from alexnet on cpu in eager mode with input batch size 128.
CPU Total Wall Time: 108.189 milliseconds
```
```shell
python run.py alexnet -d cpu -m eager -t eval --channels-last
Running eval method from alexnet on cpu in eager mode with input batch size 128.
CPU Total Wall Time: 72.930 milliseconds
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1371
Reviewed By: davidberard98
Differential Revision: D43273579
Pulled By: xuzhao9
fbshipit-source-id: 9597d996d27dd228445e3e8122e5e7131cc93669