[Static Runtime] Make per-op latency readable by FAI-PEP (#64315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64315
Add a new flag `generate_ai_pep_output` to `StaticRuntime::benchmark`. If set, produces per-op-kind average total latency in milliseconds in a JSON format recognized by [Facebook AI performance evaluation platform (FAI-PEP)](https://github.com/facebook/FAI-PEP).
This is useful for observing the impact of changes that make a big difference for a specific op, but do not affect the overall SR latency by more than a few percent.
Reviewed By: hlu1
Differential Revision: D30679352
fbshipit-source-id: c847fa6ea20774aaf1e7949b11db4421d1f70b7e