Collect CUDA/CPU profiling info into result sheets. (#5921)
* Collect CUDA/CPU profiling info into result sheets.
This PR:
0. Adds CUDA/CPU collection capabilties to the script.
1. Modifies result_analyzer.py to analyze newly collected results.
2. Moves CUDA synchronize/XLA device synchronize into the profiler.
3. Fixes list typing for Python 3.8+.
Tested with command:
python3 xla/benchmarks/experiment_runner.py --dynamo=openxla --xla=PJRT --test=train --filter=basic_gnn_gcn$ --suite-name=torchbench --accelerator=cuda --progress-bar --output-dirname=/tmp/output --repeat=2 --print-subprocess --no-resume --profile-cuda-cpu-collect --profile-cuda
python3 xla/benchmarks/result_analyzer.py --output-dir=/tmp/output
* Lint, and add _s suffix to metrics
---------
Co-authored-by: root <root@olechwierowicz9.zrh.corp.google.com>