[AMDGPU] Add DS loop wait optimization infrastructure (1/4)
Add the infrastructure for DS wait count optimization in single-block loops
with WMMA instructions (GFX12+). This patch adds the loop eligibility check.
This is the first of 4 patches to split the DS loop wait optimization.
Subsequent patches will add:
- DS load position analysis
- Wait count relaxation
- Preheader flush and edge case handling
Assisted-by: Cursor / claude-4.5-opus-high