Add ncu report analyzer (#2497)

Commit

1 year ago

Add ncu report analyzer (#2497) Summary: This PR adds a ncu report analyzer to analyze the profiled ncu report. It also adds two metrics `memory_traffic` and `arithmetic_intensity`. To avoid excessive profiling overhead, we only profile with necessary ncu metrics. This PR is a part of [operator benchmarking plan](https://github.com/pytorch/pytorch/issues/136168) Example commands: ``` python run_benchmark.py triton --op gather_gemv --num-inputs 1 --metrics memory_traffic,arithmetic_intensity --csv ``` Example output: ``` 0%| | 0/1 [00:00<?, ?it/s]==PROF== Connected to process 508958 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10) ==PROF== Profiling "index_elementwise_kernel" - 0: 0%....50%....100% - 3 passes ==PROF== Profiling "unrolled_elementwise_kernel" - 1: 0%....50%....100% - 3 passes ==PROF== Profiling "gemv2T_kernel_val" - 2: 0%....50%....100% - 3 passes 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.89s/it] x_val;test_eager-_ncu_trace_in_task 2048;success ==PROF== Disconnected from process 508958 ==WARNING== No source files were imported. Check that the target application was compiled with -lineinfo. ==PROF== Report: /scratch/yhao/tmp/tritonbench/gather_gemv/ncu_traces/test_eager_0/ncu_output.ncu-rep 0%| | 0/1 [00:00<?, ?it/s]==PROF== Connected to process 509121 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10) ==PROF== Profiling "triton_red_fused_mv_0" - 0: 0%....50%....100% - 3 passes 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00, 3.79s/it] x_val;test_0-_ncu_trace_in_task 2048;success ==PROF== Disconnected from process 509121 ==PROF== Report: /scratch/yhao/tmp/tritonbench/gather_gemv/ncu_traces/test_0_0/ncu_output.ncu-rep 0%| | 0/1 [00:00<?, ?it/s]==PROF== Connected to process 509285 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10) ==PROF== Profiling "triton_red_fused_mv_0" - 0: 0%....50%....100% - 3 passes ==PROF== Connected to process 509433 (/scratch/yhao/miniconda3/envs/pta_gil/bin/python3.10) 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:04<00:00, 4.07s/it] x_val;test_inductor-_ncu_trace_in_task 2048;success ==PROF== Disconnected from process 509285 ==PROF== Disconnected from process 509433 ==PROF== Report: /scratch/yhao/tmp/tritonbench/gather_gemv/ncu_traces/test_inductor_0/ncu_output.ncu-rep 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:23<00:00, 23.99s/it] x_val;test_eager-arithmetic_intensity;test_eager-memory_traffic;test_eager-weighted_fp32_arithmetic_intensity;test_0-arithmetic_intensity;test_0-memory_traffic;test_0-weighted_fp32_arithmetic_intensity;test_inductor-arithmetic_intensity;test_inductor-memory_traffic;test_inductor-weighted_fp32_arithmetic_intensity 2048;(0.14937214493924472, 0.0);(29467392.0, 505856.0);0.14937214493924472;(4.364079147640791, 0.0);(4204544.0, 256.0);4.364079147640791;(9.97989888530182, 0.0);(4202752.0, 0.0);9.97989888530182 ``` according to ncu, there can be multiple roofline charts on different granularity, such as single precision, double precision, tensorcore, and half precision. Pull Request resolved: https://github.com/pytorch/benchmark/pull/2497 Reviewed By: xuzhao9 Differential Revision: D64359055 Pulled By: FindHao fbshipit-source-id: a02a4ebfcac5c5209f4196aac5a8eb4f91b3de87

Author

FindHao

Committer

facebook-github-bot

Parents

db41e776

benchmark 21cc30dc - Add ncu report analyzer (#2497)

benchmark
21cc30dc - Add ncu report analyzer (#2497)