Enable replicated embedding in SPMD for NLP models (#98686)
For models like NanoGPT, embeddings are replicated and input ids
are sharded. In this case, output lookups should be sharded to
match ids.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98686
Approved by: https://github.com/yifuwang