Add reparameterization support to `OneHotCategorical` (#46610)
Summary:
Add reparameterization support to the `OneHotCategorical` distribution. Samples are reparameterized based on the straight-through gradient estimator, which is proposed in the paper [Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation](https://arxiv.org/abs/1308.3432).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46610
Reviewed By: neerajprad
Differential Revision: D25272883
Pulled By: ezyang
fbshipit-source-id: 8364408fe108a29620694caeac377a06f0dcdd84