Introduce distributed checkpoint with ShardedTensor.
This is a copy of #76123.
I had to create a new PR due to some infra limitations so please look at the other PR for comment history.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76897
Approved by: https://github.com/wanchaol