benchmark
cc4d824d - Add fp16 amp mode support for all models (#776)

Commit
4 years ago
Add fp16 amp mode support for all models (#776) Summary: This PR adds fp16 amp to all models. Basically, it adds an autocast context to all eval tests: ``` with torch.cuda.amp.autocast(): eval() ``` I have the following concerns regarding to the current amp mode: 1. Some models don't support it. Example: BERT_pytorch, attention_is_all_you_need_pytorch Reproduction: ``` $ python run.py BERT_pytorch -d cuda --fp16 amp Running eval method from BERT_pytorch on cuda in eager mode. File "/fsx/users/xzhao9/benchmark/torchbenchmark/models/BERT_pytorch/bert_pytorch/model/attention/single.py", line 19, in forward scores = scores.masked_fill(mask == 0, -1e9) RuntimeError: value cannot be converted to type at::Half without overflow ``` 2. Some models don't return correct result. Example: dlrm, moco, pyhpc_turbulent_kinetic_energy Reproduction: ``` $ python run.py pyhpc_turbulent_kinetic_energy -d cuda --fp16 amp Running eval method from pyhpc_turbulent_kinetic_energy on cuda in eager mode. GPU Time: 7.316 milliseconds CPU Dispatch Time: 7.251 milliseconds CPU Total Wall Time: 7.350 milliseconds Correctness: 0.000000000000000 ``` 3. About 2/3 models slightly regress in performance in amp mode. Example: squeezenet1_1, alexnet Reproduction: ``` $ python run.py alexnet -d cuda --fp16 amp Running eval method from alexnet on cuda in eager mode. GPU Time: 1.475 milliseconds CPU Dispatch Time: 1.305 milliseconds CPU Total Wall Time: 1.509 milliseconds Correctness: 0.999999880790710 ``` ``` $ python run.py alexnet -d cuda --fp16 no Running eval method from alexnet on cuda in eager mode. GPU Time: 1.095 milliseconds CPU Dispatch Time: 0.994 milliseconds CPU Total Wall Time: 1.126 milliseconds ``` The slowdown is 0.74x. Pull Request resolved: https://github.com/pytorch/benchmark/pull/776 Reviewed By: ejguan Differential Revision: D34559508 Pulled By: xuzhao9 fbshipit-source-id: cf585aac5e5eaedbcdca9e8292420a8beae82481
Author
Parents
Loading