pytorch
a51f047c - Synchronize MAGMA functions with the current CUDA stream (#36605)

Commit

4 years ago

Synchronize MAGMA functions with the current CUDA stream (#36605) Summary: Fixes https://github.com/pytorch/pytorch/issues/21821 This follows ngimel's [suggestion](https://github.com/pytorch/pytorch/issues/21821#issuecomment-502968982) to manually synchronize MAGMA calls with the current stream. This is handled automatically with `MagmaStreamSyncGuard`. I think for the functions with `_batched` variants we could possibly avoid synchronisation by using a batch of size 1 since these have a `magma_queue_t` argument. However, I presume there's a reason it wasn't written like that in the first place. I also figured out why porting to aten ["magically fixed"](https://github.com/pytorch/pytorch/issues/21821#issuecomment-527647971) `torch.svd`. The magma functions for svd all take host arrays as input and output. The ATen port uses blocking `copy_`s which fully synchronize the operation. On the other hand, the THC functions use `cudaMemcpy` which doesn't synchronize with streams created with `cudaStreamNonBlocking` (which `aten` does). The fix is to use `cudaMemcpyAsync` and `cudaStreamSynchronize`, the same as `copy_` does internally: https://github.com/pytorch/pytorch/blob/835ee34e38eed3f5b35726b40be9c48e75201618/aten/src/ATen/native/cuda/Copy.cu#L192-L193 I'm not sure how to test these changes as I wasn't able to reproduce any of the stream sync issues. Possibly a mixture of non-determinism and because some of these functions are implicitly synchronous anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36605 Differential Revision: D21258265 Pulled By: ngimel fbshipit-source-id: 76d8f687c605e5e9cd68b97dc1d70a39a13376ec

Author

peterbell10

Committer

facebook-github-bot

Parents

d068a456

pytorch a51f047c - Synchronize MAGMA functions with the current CUDA stream (#36605)

pytorch
a51f047c - Synchronize MAGMA functions with the current CUDA stream (#36605)