pytorch
a2cb94d2 - [PT-D][Sharding] Enable more ops needed in the transformer model training

Commit
2 years ago
[PT-D][Sharding] Enable more ops needed in the transformer model training Pull Request resolved: https://github.com/pytorch/pytorch/pull/77214 From the code base of MetaSeq Model, we have found that loads of ops are not supported by sharded tensor. In https://github.com/pytorch/pytorch/pull/75374, we have enabled most of ops already and this PR/diff aims at enabling the rest of them. Fix some unit test errors. Differential Revision: [D36302780](https://our.internmc.facebook.com/intern/diff/D36302780/) Approved by: https://github.com/wanchaol, https://github.com/pritamdamania87
Author
Committer
Parents
Loading