Add binary to benchmark model load speed (#74700)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74700
Test Plan:
Imported from OSS
Some results running this benchmark for a quantized CPU xirp14b model on a Pixel 5:
```
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "46749"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19261"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19235"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19396"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19486"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19562"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19566"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19559"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19632"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19938"}
```
Some results running this benchmark for the Vulkan xirp20a model on Pixel 5, after pre-loading the Context:
```
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "38664"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19921"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20316"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20255"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20219"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20329"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20463"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "21072"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20668"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20889"}
```
Without pre-loading Context:
```
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "70850"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "19867"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20211"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20039"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20082"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20268"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20363"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "21103"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20511"}
PyTorchObserver {"type": "NET", "unit": "us", "metric": "latency", "value": "20528"}
```
Reviewed By: mrshenli
Differential Revision: D35124881
Pulled By: SS-JIA
fbshipit-source-id: 0f093e4aa45d69c538a4fe2003e0d5617d72b97a
(cherry picked from commit 96f991420ad720300aea51cc0a1a6c0f79d2820b)