[CUDA][UR] Add fallback for cross-device USM memcpy
- Try cuMemcpyPeerAsync first for explicit cross-device intent
- Fall back to cuMemcpyAsync if peer copy fails (e.g., no P2P support)
- cuMemcpyAsync works for managed memory due to automatic migration
- Add null checks for safety in allocation registration