benchmark
37fa676a - Fix BERT_pytorch model on bf16. (#2185)

Commit
2 years ago
Fix BERT_pytorch model on bf16. (#2185) Summary: The `get_module()` impl of BERT_pytorch is buggy because it only returns partial computation involved in `train()` and `eval()`. As a result, when running in `bf16` precision, only part of the model are converted to `bf16` and it does not work well with the rest of the model running in `eval()`. This fix will return the entire model in `get_module()` and fix the bug when running with bf16 precision for both eager and pt2 mode. Pull Request resolved: https://github.com/pytorch/benchmark/pull/2185 Test Plan: ``` $ python run.py BERT_pytorch -d cuda --precision bf16 --torchdynamo inductor Running eval method from BERT_pytorch on cuda in dynamo inductor mode with input batch size 32 and precision bf16. GPU Time per batch: 14.625 milliseconds CPU Wall Time per batch: 14.663 milliseconds CPU Wall Time: 14.663 milliseconds Time to first batch: 3477.1816 ms GPU 0 Peak Memory: 3.8965 GB CPU Peak Memory: 0.7637 GB PT2 Compilation time: 33.701 seconds ``` Reviewed By: HDCharles Differential Revision: D54621014 Pulled By: xuzhao9 fbshipit-source-id: abfeae48c92f0d4b437c8111e7f1e3a37e088876
Author
Parents
Loading