Do not register USM Shared allocations for cross-device tracking
Root cause: CUDA Managed Memory (USM Shared) is automatically
migrated between devices by the CUDA runtime. Manual peer-to-peer
copies using cuMemcpyPeerAsync conflict with this automatic migration
mechanism, preventing data from propagating correctly across devices.
Solution: Only register USM Device allocations in metadata tracking.
When urEnqueueUSMMemcpy encounters untracked pointers (Shared memory),
it uses cuMemcpyAsync which allows CUDA runtime to handle migration
transparently through the Unified Memory subsystem.
This fixes urEnqueueKernelLaunchIncrementMultiDeviceTest failures
where Expected: 3-9, Got: 2 (kernels incrementing shared memory
across devices were not seeing updated values because cuMemcpyPeerAsync
bypassed CUDA's automatic migration).