llvm
dcad3b2d - Do not register USM Shared allocations for cross-device tracking

Commit
50 days ago
Do not register USM Shared allocations for cross-device tracking Root cause: CUDA Managed Memory (USM Shared) is automatically migrated between devices by the CUDA runtime. Manual peer-to-peer copies using cuMemcpyPeerAsync conflict with this automatic migration mechanism, preventing data from propagating correctly across devices. Solution: Only register USM Device allocations in metadata tracking. When urEnqueueUSMMemcpy encounters untracked pointers (Shared memory), it uses cuMemcpyAsync which allows CUDA runtime to handle migration transparently through the Unified Memory subsystem. This fixes urEnqueueKernelLaunchIncrementMultiDeviceTest failures where Expected: 3-9, Got: 2 (kernels incrementing shared memory across devices were not seeing updated values because cuMemcpyPeerAsync bypassed CUDA's automatic migration).
Author
Parents
Loading