pytorch
1a299d8f - Add support for transformer layout of masked_softmax (#69272)

Commit

2 years ago

Add support for transformer layout of masked_softmax (#69272) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69272 In transformer encoder and MHA, masked_softmax's mask is a 2D tensor (B, D), where input is a 4D tensor (B, H, D, D). This mask could be simply broadcasted to a (B, H, D, D) like input, and then do a regular masked_softmax, however it will bring the problem of non-contiguous mask & consume more memory. In this diff, we maintained mask's shape unchanged, while calc the corresponding mask for input in each cuda thread. This new layout is not currently supported in CPU yet. Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32605557 fbshipit-source-id: ef37f86981fdb2fb264d776f0e581841de5d68d2

References

#69928 - Merge branch 'master' into lazy_tensor_staging

Author

zrphercule2

Committer

facebook-github-bot

Parents

2e7a91c4

pytorch 1a299d8f - Add support for transformer layout of masked_softmax (#69272)

pytorch
1a299d8f - Add support for transformer layout of masked_softmax (#69272)