[CUDA] Revert to not tracking USM Shared allocations
After extensive testing with various approaches including:
- Detecting memory type (Managed vs Device) and using different APIs
- Using cuMemcpyPeerAsync for all cross-device copies
- Stream synchronization before peer copies
None of these approaches worked for Managed Memory cross-device copies.
Current hypothesis: CUDA Managed Memory between GPUs may not support
explicit memcpy operations the same way as CPU<->GPU. Reverting to
let CUDA runtime handle Managed Memory migration automatically.