Add per-session thread pool work callbacks API (#27253)
## Description
Adds per-session thread pool work callbacks, allowing callers to hook
into the enqueue/start/stop/abandon lifecycle of thread pool work items.
The feature is gated behind a build flag
(`--enable_session_threadpool_callbacks`) with zero overhead when
disabled.
## API additions
- C API: `OrtApi::SetPerSessionThreadPoolCallbacks` — stores an
`OrtThreadPoolCallbacksConfig` on the `OrtEnv`, applied to per-session
thread pools
- C++ wrapper: `Ort::Env::SetPerSessionThreadPoolCallbacks`
- Versioned C config struct `OrtThreadPoolCallbacksConfig` with fields:
`on_enqueue`, `on_start_work`, `on_stop_work`, `on_abandon`,
`user_context`
- Four callback typedefs: `OrtThreadPoolWorkEnqueueFn`,
`OrtThreadPoolWorkStartFn`, `OrtThreadPoolWorkStopFn`,
`OrtThreadPoolWorkAbandonFn`
## Implementation
- `EigenNonBlockingThreadPool.h`: Introduced a policy-based design with
two compile-time callback policies:
- `WorkNoCallbackPolicy`: `Work = std::function<void()>`, all callback
methods are trivial inlines eliminated by the compiler. Zero overhead
for non-callback builds.
- `WorkWithCallbackPolicy`: `Work = WorkItem` bundling tasks with
callback data; invokes user callbacks around task execution via
`MakeWork`/`Execute`/`OnEnqueue`/`OnAbandon` methods.
- `ThreadPoolTempl<Environment, CallbackPolicy>` uses the policy for all
callback-related operations.
- `RunQueue::RevokeWithTag` calls `policy_->OnAbandon(e.w)` on
successful revocation; the policy implementation decides whether to
invoke user callbacks.
- `threadpool.h`: `extended_eigen_threadpool_` changed to
`unique_ptr<ExtendedThreadPoolInterface>` for type erasure across policy
instantiations. `EnableSpinning`/`DisableSpinning` added to the virtual
interface.
- `threadpool.cc`: Single `#ifdef` selects policy at `ThreadPoolTempl`
instantiation.
- `environment.h/.cc`: Added
`SetPerSessionWorkCallbacks`/`GetPerSessionWorkCallbacks` on
`Environment`.
- `inference_session.cc`: Propagates callbacks from `Environment` to
per-session thread pool options.
- `thread_utils.h/.cc`: Added callback fields to `OrtThreadPoolParams`
and wiring in `CreateThreadPoolHelper`.
- `env.h`: `OrtThreadPoolCallbacksConfig*` pointer in `ThreadOptions`.
## Build
- CMake option `onnxruntime_ENABLE_SESSION_THREADPOOL_CALLBACKS`;
`build.py` argument `--enable_session_threadpool_callbacks`
## Tests
- 8 callback-specific tests: Schedule, OnEnqueueOnly, NoCallbacks,
ParallelFor, ParallelSection, Abandon, EnqueueReturnsNull,
NoEnqueueWithStartStop
- End-to-end C API test (`SetPerSessionThreadPoolCallbacks` via
ModelBuilder with 1M-element Mul)
- All 73 existing ThreadPool tests pass unchanged with both
callback-enabled and callback-disabled builds (81/81 and 73/73
respectively)
## Motivation and Context
Thread pool work callbacks enable telemetry, tracing, and resource
management by providing visibility into when work is enqueued, executed,
and abandoned in per-session thread pools. This is needed for production
diagnostics and performance instrumentation scenarios.
---------
Co-authored-by: Siyuan Peng <siyuanpeng@microsoft.com>