llvm
66c6a4ca - cuda: Track USM allocation metadata for cross-device operations

Commit

36 days ago

cuda: Track USM allocation metadata for cross-device operations Problem: In multi-device contexts, each device has its own primary CUDA context. When USM memory allocated on device A is accessed from a queue on device B, using cuMemcpyAsync fails because the stream belongs to context B but operates on memory from context A. Root cause: - urUSMSharedAlloc/urUSMDeviceAlloc allocate memory in device-specific contexts - urEnqueueUSMMemcpy receives pointers without knowing their origin device - Cross-context operations require explicit cuMemcpyPeerAsync with both contexts Solution: Track allocation metadata in ur_context to record which device allocated each USM pointer. In urEnqueueUSMMemcpy, query this metadata to detect cross-device copies and use cuMemcpyPeerAsync with explicit source and destination contexts. Changes: - Add AllocationMetadata map to ur_context_handle_t with thread-safe access - Register allocations in urUSMDeviceAlloc and urUSMSharedAlloc - Unregister in urUSMFree - Query metadata in urEnqueueUSMMemcpy to detect cross-device copies - Use cuMemcpyPeerAsync for cross-device, cuMemcpyAsync otherwise This is a clean, O(1) solution that correctly handles cross-context operations without trial-and-error approaches.

Author

kekaczma

Committer

kekaczma

Parents

39266e62

llvm 66c6a4ca - cuda: Track USM allocation metadata for cross-device operations

llvm
66c6a4ca - cuda: Track USM allocation metadata for cross-device operations