Add runner for instruction count benchmarks. (#54652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54652
This PR adds a fairly robust runner for the instruction count microbenchmarks. Key features are:
* Timeout and retry. (In rare cases, Callgrind will hang under heavy load.)
* Robust error handling and keyboard interrupt support.
* Benchmarks are pinned to cores. (Wall times still won't be great, but it's something.)
* Progress printouts, including a rough ETA.
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D27537823
Pulled By: robieta
fbshipit-source-id: 699ac907281d28bf7ffa08594253716ca40204ba