Fix/fix autotp universal checkpoint ci (#7937)
The full CI test
[fails](https://github.com/deepspeedai/DeepSpeed/actions/runs/23735417401/job/69138729446)
throwing "RuntimeError: Cannot re-initialize CUDA" because of tests for
universal checkpoint and AutoTP.
It happens because they run `torch.cuda.current_device()` under `pytest
--forked`. As the tests only touch universal checkpoint metadata, we
won't need to call it. This PR skips constructor-time AutoTP
materialization when `mp_group` is `None`.
Partitioning still happens in the real AutoTP usage where an actual
model-parallel group is given.
---------
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>