llvm-project
d8a9c560 - [CUDA] refactor in-header implementation of ld/st with different cache modes. (#190021)

Commit

13 days ago

[CUDA] refactor in-header implementation of __ld*/__st* with different cache modes. (#190021) * Generalized creation of the variant sets. * Added implementations for the missing operation modes. Now we match what's available in CUDA headers. * Cleaned up discrepancies in `__asm__ __volatile__` use (needed for some ops that warm up the cache, but should not be discarded if the load result is unused) Manually verified that clang's versions of these functions generate exactly the same instructions nvcc generates from CUDA headers.

References

#190021 - [CUDA] refactor in-header implementation of __ld*/__st* with different cache modes.

Author

Artem-B

Parents

f2b33d79

llvm-project d8a9c560 - [CUDA] refactor in-header implementation of __ld*/__st* with different cache modes. (#190021)

llvm-project
d8a9c560 - [CUDA] refactor in-header implementation of ld/st with different cache modes. (#190021)