pytorch
a2cb94d2 - [PT-D][Sharding] Enable more ops needed in the transformer model training

Commit

2 years ago

[PT-D][Sharding] Enable more ops needed in the transformer model training Pull Request resolved: https://github.com/pytorch/pytorch/pull/77214 From the code base of MetaSeq Model, we have found that loads of ops are not supported by sharded tensor. In https://github.com/pytorch/pytorch/pull/75374, we have enabled most of ops already and this PR/diff aims at enabling the rest of them. Fix some unit test errors. Differential Revision: [D36302780](https://our.internmc.facebook.com/intern/diff/D36302780/) Approved by: https://github.com/wanchaol, https://github.com/pritamdamania87

Author

fduwjj

Committer

pytorchmergebot

Parents

e5a5cd14

pytorch a2cb94d2 - [PT-D][Sharding] Enable more ops needed in the transformer model training

pytorch
a2cb94d2 - [PT-D][Sharding] Enable more ops needed in the transformer model training