Add amp+train support and channels_last support to timm models (#900)
Summary:
This PR adds amp precision support to timm train tests.
It also adds channels-last option to both train and eval tests.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/900
Test Plan:
```
$ python run.py timm_nfnet -t train -d cuda --precision amp
Running train method from timm_nfnet on cuda in eager mode.
GPU Time: 256.267 milliseconds
CPU Total Wall Time: 256.305 milliseconds
```
```
$ python run.py timm_nfnet -t train -d cuda --precision amp --channels-last
Running train method from timm_nfnet on cuda in eager mode.
GPU Time: 234.359 milliseconds
CPU Total Wall Time: 234.450 milliseconds
```
Reviewed By: erichan1
Differential Revision: D36380817
Pulled By: xuzhao9
fbshipit-source-id: c3ab231a7fde7211e484de72e24059b64b5fd3c4