llvm-project
d8a9c560 - [CUDA] refactor in-header implementation of __ld*/__st* with different cache modes. (#190021)

Commit
13 days ago
[CUDA] refactor in-header implementation of __ld*/__st* with different cache modes. (#190021) * Generalized creation of the variant sets. * Added implementations for the missing operation modes. Now we match what's available in CUDA headers. * Cleaned up discrepancies in `__asm__ __volatile__` use (needed for some ops that warm up the cache, but should not be discarded if the load result is unused) Manually verified that clang's versions of these functions generate exactly the same instructions nvcc generates from CUDA headers.
Author
Parents
Loading