Speedup bernoulli_scalar_cuda_kernel with grid-stride loop (#21300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21300
ghimport-source-id: c314c28cb693b554d6f24de235c11ba24ed6bf61
Reviewed By: jerryzh168
Differential Revision: D15632935
Pulled By: ezyang
fbshipit-source-id: 9bb24f17d78151bf50942905c967bdcfe1ff00cb