Add WAITPKG checks, add support for TPAUSE within SpinPause (#24524)
### Description
This change introduces `TPAUSE` support in the `SpinPause()` function in
Windows and Linux to reduce power consumption and improve efficiency
during spin-wait periods. `TPAUSE` is a lightweight power/performance
ISA that goes into an optimized C0 power state while waiting on a delay
event, compared to `_mm_pause()` which is a NOP-like instruction that
provides a small delay in the CPU Pipeline. With this change,
performance of First Inference Latency across certain models can also
improve. Models that were tested internally have shown up to ~2x
improvement in First Inference Latency and up to ~20% lower overall
power consumption.
Genuine Intel CPUID detection logic was also refactored into a shared
utility (`CheckIntel()`), enabling consistent platform checks across the
codebase. Here `TPAUSE` is enabled by default for architectures that
support it.
[Intel Intrinsics Guide
(TPAUSE)](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=tpause&techs=MMX,SSE_ALL,AVX_ALL,AVX_512,AMX,SVML,Other&ig_expand=6888,6888)
### Motivation and Context
Performance and power efficiency gains - Previous PR was created which
initially introduced the TPAUSE instruction in `SpinPause()` with
measured improvements in power (please see previous TPAUSE PR here: [Add
WAITPKG checks, add support for TPAUSE in ThreadPool spin
#16935](https://github.com/microsoft/onnxruntime/pull/16935)).
Additional performance testing and measurements were done across Mobile,
Desktop, and Server, influencing enhancements to the PR such as a tweak
to the `spin_delay_cycles`, Linux support and the refactored Intel CPUID
detection logic.