Support probing the best batch size for inference tests (#1027)
Summary:
Add proper_bs to find the best batch_size for current devices.
Users can specify `--proper_bs` for `run_sweep.py` to enable this feature.
The final output file will contains details and optimal batch_size in the `results` section like the following.
```json
"results": {
"details": [
{
"batch_size": 1,
"latency_ms": 49.924044499999994,
"tflops": 0.726485308717901
},
{
"batch_size": 2,
"latency_ms": 26.455478749999997,
"tflops": 1.4638535345863946
},
{
"batch_size": 3,
"latency_ms": 19.817013499999998,
"tflops": 1.8383686961789516
}
],
"optimal_latency_bs": 3,
"optimal_tflops_bs": 3
}
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1027
Reviewed By: xuzhao9
Differential Revision: D37802560
Pulled By: FindHao
fbshipit-source-id: a43a0cc634aec9643e6f9d3f2ada79cc00ba22f6