pytorch
db412a68 - Avoid 2 extra copies when reducing sparse tensors and fix result() vs inplace output discrepancy (#57822)

Commit View On GitHub

Commit

3 years ago

Avoid 2 extra copies when reducing sparse tensors and fix result() vs inplace output discrepancy (#57822) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57822 * `AsyncSparseAllreduceWork` can avoid copying output tensors, since we keep all the results alive by means of modifying input vector directly * `AsyncSparseAllreduceWork` now returns inputs back to user instead of former behavior where it returned copies of inputs. This is consistent with other operations and process group implementations * `AsyncSparseAllreduceCUDAWork` is now copying tensors directly from CPU to input tensors avoiding extra copy `output` -> `outputs` -> `inputs`. inputs are being returned to back to user. This is consistent with other operations and process group implementations. overall AsyncSparseAllreduceCUDAWork is now avoiding 2 extra copies (as AsyncSparseAllreduceCUDAWork is using AsyncSparseAllreduceWork's impl) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28298325 Pulled By: agolynski fbshipit-source-id: 18e2104413cdf5e73a01aad464e2613807779297

Author

agolynski

Committer

facebook-github-bot

Parents

20430932

pytorch db412a68 - Avoid 2 extra copies when reducing sparse tensors and fix result() vs inplace output discrepancy (#57822)

Commit

pytorch
db412a68 - Avoid 2 extra copies when reducing sparse tensors and fix result() vs inplace output discrepancy (#57822)