Fix run-level profiling for subgraph operators (#27870)
### Description
Run-level profiling (introduced in PR #26846) does not currently capture
profiling events for operators inside subgraphs. This PR fixes that by
threading the `run_profiler` pointer through `OpKernelContextInternal`
to subgraph execution, following the same pattern as `terminate_flag`.
### Root Cause
`utils::ExecuteSubgraph()` had no `run_profiler` parameter and always
passed `nullptr` to `ExecuteGraphImpl`, so nested operators (inside If,
Loop, Scan, BeamSearch, GreedySearch) were never profiled at the run
level.
### Fix
1. **`OpKernelContextInternal`** — Added `run_profiler_` member and
`GetRunProfiler()` accessor.
2. **`SessionScope` / `ExecuteKernel()`** — Pass the run profiler into
`OpKernelContextInternal`.
3. **`ExecuteSubgraph()`** — Added `profiling::Profiler* run_profiler =
nullptr` parameter, forwarded to `ExecuteGraphImpl()`.
4. **Control flow ops** (`if.cc`, `loop.cc`, `scan_utils.cc`) — Pass
`context_.GetRunProfiler()` to `ExecuteSubgraph()`.
5. **Contrib transformer ops** (`beam_search_impl_gpt.h`,
`beam_search_impl_t5.h`, `beam_search_impl_whisper.h`,
`greedy_search_impl_gpt.h`) — All 8 `ExecuteSubgraph()` call sites
updated to pass `this->context_.GetRunProfiler()`.
Plugin EP control flow kernels (`PluginEpIfKernelImpl`, etc.) delegate
to the same internal kernels, so the fix propagates automatically.
### Tests
- **`CheckRunProfilerWithSubgraph`** (`inference_session_test.cc`) —
Runs `if_mul.onnx`, enables run profiling, asserts `mul_0` (inside If's
then-branch) appears in the profile JSON.
- **`CheckRunProfilerWithBeamSearch`** (`beam_search_test.cc`) — Runs
`tiny_gpt2_beamsearch.onnx`, enables run profiling, asserts decoder
subgraph Node entries (beyond the top-level BeamSearch op) appear in the
profile JSON.
### Files Changed (12 files)
| File | Change |
|------|--------|
| `core/framework/op_kernel_context_internal.h` | Added `run_profiler_`
member, `GetRunProfiler()`, constructor param |
| `core/framework/sequential_executor.cc` |
`SessionScope::GetRunProfiler()`, pass to `OpKernelContextInternal` |
| `core/framework/utils.h` / `utils.cc` | `run_profiler` param on
`ExecuteSubgraph()` |
| `core/providers/cpu/controlflow/if.cc` | Forward `GetRunProfiler()` |
| `core/providers/cpu/controlflow/loop.cc` | Forward `GetRunProfiler()`
|
| `core/providers/cpu/controlflow/scan_utils.cc` | Forward
`GetRunProfiler()` |
| `contrib_ops/cpu/transformers/beam_search_impl_gpt.h` | 2 call sites |
| `contrib_ops/cpu/transformers/beam_search_impl_t5.h` | 2 call sites |
| `contrib_ops/cpu/transformers/beam_search_impl_whisper.h` | 2 call
sites |
| `contrib_ops/cpu/transformers/greedy_search_impl_gpt.h` | 2 call sites
|
| `test/framework/inference_session_test.cc` |
`CheckRunProfilerWithSubgraph` test |
| `test/contrib_ops/beam_search_test.cc` |
`CheckRunProfilerWithBeamSearch` test |