llvm
3447c1f0 - [CUDA][UR] Use cuMemcpyPeer for cross-device USM copies

Commit
76 days ago
[CUDA][UR] Use cuMemcpyPeer for cross-device USM copies The previous synchronous cuMemcpy approach failed because it cannot properly handle cross-device copies even in synchronous mode. cuMemcpyPeer explicitly takes source and destination contexts as parameters and is designed for peer-to-peer copies between different device contexts. This works for both USM Device and USM Shared memory. The stream is synchronized before calling cuMemcpyPeer because: 1. cuMemcpyPeer is synchronous (blocks until complete) 2. We need to ensure all pending operations in the stream finish first
Author
Parents
Loading