onnxruntime
5e36544d - Add WAITPKG checks, add support for TPAUSE within SpinPause (#24524)

Commit
281 days ago
Add WAITPKG checks, add support for TPAUSE within SpinPause (#24524) ### Description This change introduces `TPAUSE` support in the `SpinPause()` function in Windows and Linux to reduce power consumption and improve efficiency during spin-wait periods. `TPAUSE` is a lightweight power/performance ISA that goes into an optimized C0 power state while waiting on a delay event, compared to `_mm_pause()` which is a NOP-like instruction that provides a small delay in the CPU Pipeline. With this change, performance of First Inference Latency across certain models can also improve. Models that were tested internally have shown up to ~2x improvement in First Inference Latency and up to ~20% lower overall power consumption. Genuine Intel CPUID detection logic was also refactored into a shared utility (`CheckIntel()`), enabling consistent platform checks across the codebase. Here `TPAUSE` is enabled by default for architectures that support it. [Intel Intrinsics Guide (TPAUSE)](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=tpause&techs=MMX,SSE_ALL,AVX_ALL,AVX_512,AMX,SVML,Other&ig_expand=6888,6888) ### Motivation and Context Performance and power efficiency gains - Previous PR was created which initially introduced the TPAUSE instruction in `SpinPause()` with measured improvements in power (please see previous TPAUSE PR here: [Add WAITPKG checks, add support for TPAUSE in ThreadPool spin #16935](https://github.com/microsoft/onnxruntime/pull/16935)). Additional performance testing and measurements were done across Mobile, Desktop, and Server, influencing enhancements to the PR such as a tweak to the `spin_delay_cycles`, Linux support and the refactored Intel CPUID detection logic.
Author
Parents
Loading