Enable amp support (BFloat16) in CPU (#1516)
Summary:
This PR is to add amp support in CPU in TorchBench, which contributes to https://github.com/pytorch/benchmark/issues/1293.
To be compatible with current amp implementation, we add 3 options in `--precision`:
`--precision bf16`: use `enable_bf16` to convert model and inputs to bf16
`--precision amp_bf16`: use `torch.cpu.amp.autocast(dtype=torch.bfloat16)` (can extend to cuda bf16 when ready)
`--precision amp_fp16`: use `torch.cuda.amp.autocast(dtype=torch.float16)` (can extend to cpu fp16 when ready)
`--precision amp`: use torch.autocast(device), same as --amp
### Performance
Test in Copper Lake machine.
$ python run.py alexnet -d cpu -m eager -t eval --precision fp32
Running eval method from alexnet on cpu in eager mode with input batch size 128 and precision fp32.
CPU Total Wall Time: 92.600 milliseconds
CPU Peak Memory: 1.1299 GB
$ python run.py alexnet -d cpu -m eager -t eval --precision bf16
Running eval method from alexnet on cpu in eager mode with input batch size 128 and precision bf16.
CPU Total Wall Time: 56.580 milliseconds
CPU Peak Memory: 0.6934 GB
$ python run.py alexnet -d cpu -m eager -t eval --precision amp_bf16
Running eval method from alexnet on cpu in eager mode with input batch size 128 and precision amp_bf16.
CPU Total Wall Time: 71.385 milliseconds
CPU Peak Memory: 0.9922 GB
$ python run.py alexnet -d cpu -m eager -t train --precision fp32
Running train method from alexnet on cpu in eager mode with input batch size 128 and precision fp32.
CPU Total Wall Time: 306.164 milliseconds
CPU Peak Memory: 2.0977 GB
$ python run.py alexnet -d cpu -m eager -t train --precision bf16
Running train method from alexnet on cpu in eager mode with input batch size 128 and precision bf16.
CPU Total Wall Time: 180.958 milliseconds
CPU Peak Memory: 1.2686 GB
$ python run.py alexnet -d cpu -m eager -t train --precision amp_bf16
Running train method from alexnet on cpu in eager mode with input batch size 128 and precision amp_bf16.
CPU Total Wall Time: 233.332 milliseconds
CPU Peak Memory: 2.0117 GB
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1516
Reviewed By: aaronenyeshi
Differential Revision: D44883144
Pulled By: xuzhao9
fbshipit-source-id: 75251f9eec128b3a1dbca39540193b89059ec183