llvm-project
5ecc6d19 - [mlir][AMDGPU] Use LDS-only MMRA fences for lds_barrier (#157919)

Commit
218 days ago
[mlir][AMDGPU] Use LDS-only MMRA fences for lds_barrier (#157919) The previous lowering strategy for amdgpu.lds_barrier (which is an operation whose semantics are) "s.barrier, and all LDS operations before this happen-before LDS operations after this, and there must not be an inherent fence/forcing-to-completion of global memory (for performance)" was previosuly implemented through using manual calls to waitcnt() intrinsics and the s_barrire intrinsic(s). The lack of explicit fencing enabled miscompiles (where LDS accesses were reordered with the barrier) on gfx12. Since LLVM now allows MMRA annotations to ensure that only LDS accesses are fenced by a pair of fences, we can now use these fences in order to explicitly represent the semantics we want instead of trying to prescribe the method of their implemntation. Note that the gfx908 workaround of hiding the s_barrier in inline assembly in order to prevent spurious vmem barriers remains in place, but is is removed for gfx11 because the fences have been changed to give us the effect we want recently.
Author
Parents
Loading