[LT] Eliminate copy cost in the DDP eager fallback (#73729)
Summary:
This commit directly extracts cuda tensor data out of the lazy tensor instead
of using the old .to() and copy_() method to avoid extra copy costs.
Test Plan:
LTC_TS_CUDA=1 gpurun python lazy_tensor_core/test/ddp/ddp_correctness.py