[AMDGPU] Enable multi-group xnack replay in hardware (GFX1250) (#169016)
This patch enables the multi-group xnack replay mode by
configuring the hardware MODE register at kernel entry.
This aligns the hardware behavior with the compiler's
existing multi-group s_wait_xcnt insertion logic.