Remove extra workspace queries in matrix inverse computation (#20904)
Summary:
Earlier, the workspace size query and allocation was placed inside the loop.
However, since we have batches of matrices with the same number of rows and columns, the workspace size query and allocation for every matrix in the batch is redundant.
This PR moves the workspace size query and allocation outside the loop, effectively saving (batch_size - 1) number of queries and allocation (and consequently the deallocation).
There is a tremendous speedup in inverse computation as a result of this change.
Changelog:
- Move workspace query and allocation outside the batch loop
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20904
Differential Revision: D15495505
Pulled By: ezyang
fbshipit-source-id: 226729734465fcaf896f86e1b1a548a81440e082