Make btriunpack work for high dimensional batches and faster than before (#15286)
Summary:
Changelog:
- Optimize btriunpack by using `torch.where` instead of indexing, inplace operations instead of out place operations and avoiding costly permutations by computing the final permutation over a list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15286
Differential Revision: D13562038
Pulled By: soumith
fbshipit-source-id: e2c94cfab5322bf1d24bf56d7b056619f553acc6