onnxruntime
879ec039 - Add enable_profiling in runoptions (#26846)

Commit
8 days ago
Add enable_profiling in runoptions (#26846) ### Description Support run-level profiling This PR adds support for profiling individual Run executions, similar to session-level profiling. Developers can enable run-level profiling by setting `enable_profiling` and `profile_file_prefix` in RunOptions. Once the run completes, a JSON profiling file will be saved using profile_file_prefix + timestamp. <img width="514" height="281" alt="png (2)" src="https://github.com/user-attachments/assets/8a997068-71d9-49ed-8a5c-00e0fa8853af" /> ### Key Changes 1. Introduced a local variable `run_profiler` in `InferenceSession::Run`, which is destroyed after the run completes. Using a dedicated profiler per run ensures that profiling data is isolated and prevents interleaving or corruption across runs. 2. To maintain accurate execution time when both session-level and run-level profiling are enabled, overloaded `Start` and `EndTimeAndRecordEvent` functions have been added. These allow the caller to provide timestamps instead of relying on `std::chrono::high_resolution_clock::now()`, avoiding potential timing inaccuracies. 3. Added a TLS variable `tls_run_profiler_` to support run-level profiling with WebGPU Execution Provider (EP). This ensures that when multiple threads enable run-level profiling, each thread logs only to its own WebGPU profiler, keeping thread-specific data isolated. 4. Use `HH:MM:SS.mm` instead of `HH:MM:SS`in the JSON filename to prevent conflicts when profiling multiple consecutive runs. ### Motivation and Context Previously, profiling only for session level. Sometimes developer want to profile for specfic run . so the PR comes. ### Some details When profiling is enabled via RunOptions, it should ideally collect two types of events: 1. Profiler events Used to calculate the CPU execution time of each operator. 2. Execution Provider (EP) profiler events Used to measure GPU kernel execution time. Unlike session-level, we need to ensure the collecting events is correct for multiple thread scenario. For 1, this can be supported easily(sequential_executor.cc). We use a thread-local storage (TLS) variable, RunLevelState (defined in profiler.h), to maintain run-level profiling state for each thread. For 2, each Execution Provider (EP) has its own profiler implementation, and each EP must ensure correct behavior under run-level profiling. This PR ensures that the WebGPU profiler works correctly with run-level profiling. # Test Cases | Scenario | Example | Expected Result | |---------|---------|-----------------| | Concurrent runs on the same session with different run-level profiling settings| t1: `sess1.Run({ enable_profiling: true })`<br>t2: `sess1.Run({ enable_profiling: false })`<br>t3: `sess1.Run({ enable_profiling: true })` | Two trace JSON files are generated: one for `t1` and one for `t3`. | | Run-level profiling enabled together with session-level profiling| `sess1 = OrtSession({ enable_profiling: true })`<br>`sess1.Run({ enable_profiling: true })` | Two trace JSON files are generated: one corresponding to session-level profiling and one corresponding to run-level profiling. |
Author
Parents
Loading