Reland "[NVPTX][AtomicExpandPass] Complete support for AtomicRMW in NVPTX (#176015)" (#179553)
This PR adds full support for atomicrmw in NVPTX. This includes:
- Memory order and syncscope support (changes in AtomicExpandPass.cpp,
NVPTXIntrinsics.td)
- Script-generated tests for integer and atomic operations
(atomicrmw.py, atomicrmw-sm*.ll in tests/CodeGen/NVPTX). Existing
atomics tests which are subsumed by these have been removed
(atomics-sm*.ll, atomics.ll, atomicrmw-expand.ll).
- ~~Changes shouldExpandAtomicRMWInIR to take a constant argument: This
is to allow some other TargetLowering constant-argument functions to
call it. This change touches several backends. An alternative solution
exists, but to me, this seems the "right" way.~~ Has been split out into
https://github.com/llvm/llvm-project/pull/176073. Rebased.
- NOTE: The initial load issued for atomicrmw emulation loops (and
cmpxchg emulation loops) must be a strong load. Currently,
AtomicExpandPass issues a weak load. Fixing this breaks several
backends. I'm planning to follow up with a separate PR.
Initially failed due to error: ptxas fatal : Value 'sm_60' is not
defined for option 'gpu-name'. Updated RUN lines in atomicrmw-sm*.py to
skip the ptxas-verify check if ptxas does not support that SM version.