Fix Python DataParallel RNN in no_grad mode (#21197)
Summary:
Fixes #21108
When grad is disabled, Python autograd function outputs are [wrapped as detached aliases](https://github.com/pytorch/pytorch/blob/8cde4c4d223d3eb1179f87fd6336d25c189acb98/torch/csrc/autograd/python_function.cpp#L395-L399), which prevents calling `Tensor.set_()` on them after recent changes in Tensors and Variables. This will hit a problem when users would like to call `rnn.flatten_parameters()` in the forward pass, as the function [calls `set_()`](https://github.com/pytorch/pytorch/blob/9d09f5df6c8126440e0f7da1235b6eaf748698e8/aten/src/ATen/native/cudnn/RNN.cpp#L669).
The proposed solution is to avoid using an autograd Broadcast if in no_grad mode.
apsdehal
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21197
Differential Revision: D15577342
Pulled By: mrshenli
fbshipit-source-id: 1a024c572171a3f2daca9454fd3ee6450d112f7c