Add cpu/gloo tests for sharded tensor distributed checkpoint (#80997)
SharedTensor checkpoint does not depend on NCCL, replicate the GPU cases for CPU and enable development on mac
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80997
Approved by: https://github.com/wanchaol