llvm
dcad3b2d - Do not register USM Shared allocations for cross-device tracking

Commit

50 days ago

Do not register USM Shared allocations for cross-device tracking Root cause: CUDA Managed Memory (USM Shared) is automatically migrated between devices by the CUDA runtime. Manual peer-to-peer copies using cuMemcpyPeerAsync conflict with this automatic migration mechanism, preventing data from propagating correctly across devices. Solution: Only register USM Device allocations in metadata tracking. When urEnqueueUSMMemcpy encounters untracked pointers (Shared memory), it uses cuMemcpyAsync which allows CUDA runtime to handle migration transparently through the Unified Memory subsystem. This fixes urEnqueueKernelLaunchIncrementMultiDeviceTest failures where Expected: 3-9, Got: 2 (kernels incrementing shared memory across devices were not seeing updated values because cuMemcpyPeerAsync bypassed CUDA's automatic migration).

Author

kekaczma

Parents

abf07483

llvm dcad3b2d - Do not register USM Shared allocations for cross-device tracking

llvm
dcad3b2d - Do not register USM Shared allocations for cross-device tracking