[profiler] add option for kineto synchronization events in the trace (#105187)
Summary:
## About Sync Events
For CUDA profiling mode, we can enable tracing CUDA synchronization events.
* This feature captures synchronization events in CUDA including 1) context/device sync, 2) stream sync, 3) CUDA event sync, 4) CUDA stream wait event (inter stream synchronization). Read more
* We add this flag using the profiler's experimental config option.
* This PR relies on https://github.com/pytorch/kineto/commit/7b003638c6d65537ac9678c101f6e9adf013b157 change in pytorch/kineto
## Usage
Just set the `enable_cuda_sync_events` option in `_ExperimentalConfig`
```
from torch.autograd.profiler import profile, _ExperimentalConfig
with profile(use_kineto=True, use_cuda=True,
experimental_config=_ExperimentalConfig(enable_cuda_sync_events=True),
) as prof:
workload()
```
**Please wait for PyTorch github repo to point to https://github.com/pytorch/kineto/commit/7b003638c6d65537ac9678c101f6e9adf013b157 or later commit in Kineto**
Test Plan:
## Unit Test
Added a unit test
buck2 test mode/dev-nosan caffe2/test:profiler --local-only -- test_profiler_cuda_sync_events
Tests finished: Pass 1. Fail 0. Fatal 0. Skip 0. Build failure 0
ttps://www.internalfb.com/intern/testinfra/testrun/281475298097379
Reviewed By: davidberard98
Differential Revision: D46244591
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105187
Approved by: https://github.com/aaronenyeshi