reduce the number of instantiations for bernoulli tensor tensor kernel (#70169)
Summary:
Reduces the binary size of DistributionBernoulli.cu 12282600 -> 3946792
Tensor-tensor bernoulli kernels are rarely used, we limit dispatches to double probability type for double `self` tensor, and `float` probability type for everything else. This would be a minor perf hit if probability tensor is of the different dtype, but given how rarely these kernels are used (and how rarely the probability tensor is not float) this is not a problem.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70169
Reviewed By: jbschlosser
Differential Revision: D33237890
Pulled By: ngimel
fbshipit-source-id: 185c4b97aba0fb6ae159d572dd5bbb13cf676bb4