Fix e2e hf_bert model (#1291)
Summary:
We need to fix hf_bert model to adapt to code changes in upstream and distributed.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1291
Test Plan:
```
$ python run_e2e.py hf_bert -t train
{"device": "cuda", "device_num": 1, "test": "train", "num_examples": 8576, "batch_size": 32, "result": {"latency": 5.36891676, "qps": 1597.3427012118548}}
```
```
$ python run_e2e.py hf_bert -t eval
{"device": "cuda", "device_num": 1, "test": "eval", "num_examples": 1043, "batch_size": 1, "result": {"latency": 13.123098491999999, "qps": 79.47818121123038}}
```
Reviewed By: yanboliang
Differential Revision: D41141811
Pulled By: xuzhao9
fbshipit-source-id: 984e718d8f27c62d5dde279f700e0d8f822352d6