[pytorch][cupti profiler 6/n] Changes to configure Kineto cupti profiler from pytorch profiler interface (#75616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75616
Kineto introduced a new profiler to read performance counters from NVIDIA GPUs (CUPTI Range Profiler API)
Here we are adding support to configure this Kineto range profiler mode
Example
```
with torch.profiler.profile(
activities=[ProfilerActivity.CUDA],
record_shapes=True,
on_trace_ready=trace_handler,
experimental_config=torch.profiler._ExperimentalConfig(
profiler_metrics=[
"kineto__tensor_core_insts",
"dram__bytes_read.sum",
"dram__bytes_write.sum"],
profiler_measure_per_kernel=False),
) as prof:
res = train_batch(modeldef)
prof.step()
```
## Details
* Introduce a new structure `KinetoProfilerConfig` so users can configure Kineto specific options, keeps profiler API consistent.
* Populate configuration options for Kineto.
Test Plan: CI and tested on resnet50
Reviewed By: robieta
Differential Revision: D34489487
fbshipit-source-id: 8ef82d2593f4f4d5824ca634f7d25507bc572caa
(cherry picked from commit 4a2af70629db55a605d4b8d0a54d41df2b247183)