benchmark
3d672dde - Add the hf_bert E2E benchmark (#771)

Commit
4 years ago
Add the hf_bert E2E benchmark (#771) Summary: This PR adds the first end-to-end workload, hf_bert, to the suite that: - Supports both train and inference - By default, uses `amp.autocast()` to do fp16 train/inference - Currently, report latency and qps as performance metrics - Doesn't support multi-GPU workload yet (will support in the future) To run the benchmark, use: `python run_e2e.py hf_bert -t eval --fp16 [no|amp]`. For example, on A100: ``` $ python run_e2e.py hf_bert -t eval {"device": "cuda", "device_num": 1, "test": "eval", "num_examples": 1043, "batch_size": 1, "result": {"latency": 14.56970322, "qps": 71.58690772563314}} $ python run_e2e.py hf_bert -t train {"device": "cuda", "device_num": 1, "test": "train", "num_examples": 8576, "batch_size": 32, "result": {"latency": 36.95959081, "qps": 232.03720095514768}} ``` Pull Request resolved: https://github.com/pytorch/benchmark/pull/771 Reviewed By: erichan1 Differential Revision: D34529471 Pulled By: xuzhao9 fbshipit-source-id: a9f8b43c9e4e4ff30dfd76c1c88fe3948976fbd2
Author
Parents
Loading