[transformer] BT enablement on fairseq - pytorch change (#79186)
The fairseq diff is split into two parts.
The first diff (this one)
This diff is about creating a mask left align function to check the mask condition for nested tensor. It is necessary for torchscript deployment.
The second diff (D37082681)
Fork the inference path inside the forward function. If loaded the checkpoint file and perform the inference, we will deploy BT. Otherwise, fairseq take the position.
Reviewed By: mikekgfb
Differential Revision: D36057338
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79186
Approved by: https://github.com/erichan1