[AMDGPU] Preserve metadata in all barrier lowering paths (#191916)
Extend copyMetadata to every call-to-call replacement in
AMDGPULowerIntrinsics, not just the single-wave s_barrier →
wave_barrier path. This covers:
- s_cluster_barrier → wave_barrier (single-wave)
- s_cluster_barrier → signal_isfirst + wait + signal + wait (multi-wave)
- s_barrier → signal + wait (split barriers)
Add GFX11 and GFX12 RUN lines and test functions for all lowering
paths to verify metadata preservation.
Made-with: Cursor