Check the stableness of model tests in userbenchmarks (#1339)
Summary:
This PR added a `experiment` directory, which exposes the APIs users can call in their userbenchmarks.
The first userbenchmark that uses this API is `model-stableness`. It checks the maximum delta (max-min/min) latency of a TorchBench model test.
Example run:
```
$ python run_benchmark.py model-stableness -d cpu -t eval -m alexnet
[{'cfg': {'name': 'alexnet', 'device': 'cpu', 'test': 'eval', 'batch_size': None, 'jit': False, 'extra_args': [], 'extra_env': None}, 'raw_metrics': {'latencies': [1832.564, 1828.382, 1832.541, 1827.971, 1834.978, 1806.657, 1846.991, 1842.245, 1832.917, 1820.841, 1850.434, 1875.838, 1888.495, 1811.257, 1854.502, 1845.449, 1843.788, 1839.329, 1837.32, 1868.965]}, 'max_delta': 0.04529802834738413}]
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1339
Test Plan:
https://github.com/pytorch/benchmark/actions/runs/3659455929
GPU workflow: https://github.com/pytorch/benchmark/actions/runs/3709191583
Reviewed By: weiwangmeta
Differential Revision: D41847360
Pulled By: xuzhao9
fbshipit-source-id: 7ace216b21bfe67795c3d023c32353dfc07cc9ae