EmbeddingBag and use autograd-enabled collectives (#81853)

Commit

2 years ago

[PT-D][BE][TP perf 1/N] Get rid of unnecessary collectives in Embedding/EmbeddingBag and use autograd-enabled collectives (#81853) These two ops (Embedding and EmbeddingBag for ShardedTensor) especially for row-wise sharding is very inefficient and hard to fit in the concept of future design. So this PR is trying to: 1. Remove all unnecessary collective communications. Only one gather and one reduce(or reduce scatter) is needed. 2. Use auto-grad enabled collectives so that we can use these ops in real model training. 3. Some minor code cleaning 4. Treat input differently when it's replicated tensor. (Will add more for this for the next few PRs). Differential Revision: [D37965687](https://our.internmc.facebook.com/intern/diff/D37965687/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/81853 Approved by: https://github.com/wanchaol

Author

fduwjj

Committer

pytorchmergebot

Parents

e09821f7

pytorch d3a176a1 - [PT-D][BE][TP perf 1/N] Get rid of unnecessary collectives in Embedding/EmbeddingBag and use autograd-enabled collectives (#81853)

pytorch
d3a176a1 - [PT-D][BE][TP perf 1/N] Get rid of unnecessary collectives in Embedding/EmbeddingBag and use autograd-enabled collectives (#81853)