llvm-project
cf25346d - [AMDGPU][GFX1250] Optimize s_wait_xcnt for back-to-back atomic RMWs (#177620)

Commit

12 days ago

[AMDGPU][GFX1250] Optimize s_wait_xcnt for back-to-back atomic RMWs (#177620) This patch optimizes the insertion of s_wait_xcnt instruction for sequences of atomic read-modify-write (RMW) operations in the SIInsertWaitcnts pass. The Memory Legalizer conservatively inserts a soft xcnt instruction before each atomic RMW operation as part of PR 168852, which is correct given the nature of atomic operations. However, for back-to-back atomic RMWs, only the first s_wait_xcnt is necessary for better runtime performance. This patch tracks atomic RMW blocks within each basic block and removes redundant soft xcnt instructions, keeping only the first wait in each sequence. An atomic RMW block continues through subsequent atomic RMWs and non-memory instructions (e.g., ALU operations) but is broken by CU-scoped memory operations, atomic stores, or basic block boundaries.

References

#177620 - [AMDGPU][GFX1250] Optimize s_wait_xcnt for back-to-back atomic RMWs

Author

cdevadas

Parents

6451685b

llvm-project cf25346d - [AMDGPU][GFX1250] Optimize s_wait_xcnt for back-to-back atomic RMWs (#177620)

llvm-project
cf25346d - [AMDGPU][GFX1250] Optimize s_wait_xcnt for back-to-back atomic RMWs (#177620)