pytorch
cf17fd6d - Fix multinomial CUDA misalignment and non-deterministic behavior (#55364)

Commit

3 years ago

Fix multinomial CUDA misalignment and non-deterministic behavior (#55364) Summary: Fixes https://github.com/pytorch/pytorch/issues/46702 - fails on probability distribution with odd items - trying to access an `acc_type` (`float`) in a `scalar_t` (`float16`) aligned memory - produce unrepeatable result for large input tensor - parallel cumsum not monotonic at some positions ### Fixes - computing cumsum on `acc_type` (`float`) instead of using `scalar_t` (`float16`) fixed both issues - the non-monotonic behavior may happen even using `float`, though - in these cases, deterministic behavior may be achieved by eliminating the race condition when writing the result, using the atomic function `atomicMax` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55364 Reviewed By: mruberry Differential Revision: D28031666 Pulled By: ngimel fbshipit-source-id: 0fc6289e0b9ea2d31ef3771e7ca370de8f5c02de

Author

ysiraichi

Committer

facebook-github-bot

Parents

6e91e90b

pytorch cf17fd6d - Fix multinomial CUDA misalignment and non-deterministic behavior (#55364)

pytorch
cf17fd6d - Fix multinomial CUDA misalignment and non-deterministic behavior (#55364)