print per block avg time when running on AI-PEP machines (#28838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28838
as title
Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test -- --ai_pep_format true
Total time: 02:36.7 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: Softmax
/proc/self/fd/4/softmax_test.py:57: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
"""
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.83197245048359"}
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.839232977246866"}
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.7970924858236685"}
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.708389271399938"}
# Benchmarking PyTorch: Softmax
...
Reviewed By: hl475
Differential Revision: D18202504
fbshipit-source-id: 4a332763432b3b5886f241bb2ce49d4df481a6f3