[dtensor] lazy init process groups in device mesh (#96700)
This PR adds a private flag to allow process grou lazy initialization, this is
replacing the previous `dim_groups` arg, as no one is using that now
This could help avoid creating process groups when not necessary
Differential Revision: [D44044664](https://our.internmc.facebook.com/intern/diff/D44044664)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96700
Approved by: https://github.com/fduwjj, https://github.com/XilunWu