onnxruntime
25d7a4fa - [CUDA] Update benchmark_mha.py to capture debug info to identify sdpa kernel (#21804)

Commit

1 year ago

[CUDA] Update benchmark_mha.py to capture debug info to identify sdpa kernel (#21804) Use debug info to identify sdpa kernel actually used, and show it in the output of benchmark_mha.py. This updated benchmark script was used to get the benchmark results in https://github.com/microsoft/onnxruntime/pull/21629. (1) Change the output format of debug info to output like SdpaKernel=* (2) Add a step to capture stdout from onnxruntime session, and use regular expression to parse SdpaKernel=* from the captured text. Other minor changes: (1) Set different default repeats during benchmark: 100 for CPU; and 10000 for CUDA. (2) Fix PrintTensorByDims used in console dumper: if it is not enabled, do not dump tensor. (3) Update some comments ### Motivation and Context Sometime, we will use fallback for a sdpa_kernel. It could confuse user unless we can tell exact kernel is used in benchmark.

References

#21804 - [CUDA] Update benchmark_mha.py to capture debug info to identify sdpa kernel

Author

tianleiwu

Parents

44a3923b

onnxruntime 25d7a4fa - [CUDA] Update benchmark_mha.py to capture debug info to identify sdpa kernel (#21804)

onnxruntime
25d7a4fa - [CUDA] Update benchmark_mha.py to capture debug info to identify sdpa kernel (#21804)