llvm-project
a1e1a84d - [NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292)

Commit
338 days ago
[NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (#139292) PTX 8.8+ introduces 256-bit-wide vector loads/stores under certain conditions. This change extends the backend to lower these loads/stores. It also overrides getLoadStoreVecRegBitWidth for NVPTX, allowing the LoadStoreVectorizer to create these wider vector operations. See the spec for the three relevant PTX instructions here: - https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld - https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-ld-global-nc - https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st
Author
Parents
Loading