[CUDA][UR] Track USM allocation metadata for cross-device operations
- Add metadata tracking to ur_context to map USM pointers to devices
- Use cuMemcpyPeerAsync for cross-device USM copies
- Enable urEnqueueKernelLaunchIncrementMultiDeviceTest for CUDA
This fixes cross-device USM operations where cuMemcpyAsync silently
fails when source and destination pointers belong to different CUDA
contexts. Each device has its own primary context, so we track which
device allocated each pointer and use cuMemcpyPeerAsync when needed.