ptx88+ (#139292)

Commit

338 days ago

[NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292) PTX 8.8+ introduces 256-bit-wide vector loads/stores under certain conditions. This change extends the backend to lower these loads/stores. It also overrides getLoadStoreVecRegBitWidth for NVPTX, allowing the LoadStoreVectorizer to create these wider vector operations. See the spec for the three relevant PTX instructions here: - https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld - https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld-global-nc - https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st

References

#139292 - [NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+

Author

dakersnar

Parents

952b680f

llvm-project a1e1a84d - [NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292)

llvm-project
a1e1a84d - [NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292)