benchmark
147a0494 - Enable amp support (BFloat16) in CPU (#1516)

Commit
3 years ago
Enable amp support (BFloat16) in CPU (#1516) Summary: This PR is to add amp support in CPU in TorchBench, which contributes to https://github.com/pytorch/benchmark/issues/1293. To be compatible with current amp implementation, we add 3 options in `--precision`: `--precision bf16`: use `enable_bf16` to convert model and inputs to bf16 `--precision amp_bf16`: use `torch.cpu.amp.autocast(dtype=torch.bfloat16)` (can extend to cuda bf16 when ready) `--precision amp_fp16`: use `torch.cuda.amp.autocast(dtype=torch.float16)` (can extend to cpu fp16 when ready) `--precision amp`: use torch.autocast(device), same as --amp ### Performance Test in Copper Lake machine. $ python run.py alexnet -d cpu -m eager -t eval --precision fp32 Running eval method from alexnet on cpu in eager mode with input batch size 128 and precision fp32. CPU Total Wall Time: 92.600 milliseconds CPU Peak Memory: 1.1299 GB $ python run.py alexnet -d cpu -m eager -t eval --precision bf16 Running eval method from alexnet on cpu in eager mode with input batch size 128 and precision bf16. CPU Total Wall Time: 56.580 milliseconds CPU Peak Memory: 0.6934 GB $ python run.py alexnet -d cpu -m eager -t eval --precision amp_bf16 Running eval method from alexnet on cpu in eager mode with input batch size 128 and precision amp_bf16. CPU Total Wall Time: 71.385 milliseconds CPU Peak Memory: 0.9922 GB $ python run.py alexnet -d cpu -m eager -t train --precision fp32 Running train method from alexnet on cpu in eager mode with input batch size 128 and precision fp32. CPU Total Wall Time: 306.164 milliseconds CPU Peak Memory: 2.0977 GB $ python run.py alexnet -d cpu -m eager -t train --precision bf16 Running train method from alexnet on cpu in eager mode with input batch size 128 and precision bf16. CPU Total Wall Time: 180.958 milliseconds CPU Peak Memory: 1.2686 GB $ python run.py alexnet -d cpu -m eager -t train --precision amp_bf16 Running train method from alexnet on cpu in eager mode with input batch size 128 and precision amp_bf16. CPU Total Wall Time: 233.332 milliseconds CPU Peak Memory: 2.0117 GB Pull Request resolved: https://github.com/pytorch/benchmark/pull/1516 Reviewed By: aaronenyeshi Differential Revision: D44883144 Pulled By: xuzhao9 fbshipit-source-id: 75251f9eec128b3a1dbca39540193b89059ec183
Author
Parents
Loading