pytorch
03cc0f58 - Don't create large intermediary tensors in the backward of matmul (#95261)

Commit View On GitHub

Commit

1 year ago

Don't create large intermediary tensors in the backward of matmul (#95261) Currently, if we multiply a transposed batch of matrices with shape [b, m, n] and a matrix with shape [n, k], when computing the gradient of the matrix, we instantiate a matrix of shape [b, n, k]. This may be a very large matrix. Instead, we fold the batch of matrices into a matrix, which avoids creating any large intermediary tensor. Note that multiplying a batch of matrices and a matrix naturally occurs within an attention module, so this case surely happens in the wild. In particular, this issue was found while investigating the OOMs caused by the improved folding algorithm in the next PR of this stack. See https://github.com/pytorch/pytorch/pull/76828#issuecomment-1432359980 This PR fixes those OOMs and decreases the memory footprint of the backward of matmul. I understand this is a tricky one, so I put it on its own PR to discuss it. Differential Revision: [D43541495](https://our.internmc.facebook.com/intern/diff/D43541495) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95261 Approved by: https://github.com/ezyang

Author

lezcano

Committer

pytorchmergebot

Parents

fd8367a7

pytorch 03cc0f58 - Don't create large intermediary tensors in the backward of matmul (#95261)

Commit

pytorch
03cc0f58 - Don't create large intermediary tensors in the backward of matmul (#95261)