Use fn(param) instead of fn(param.data) in nn.Module._apply (#21865)
Summary:
When we pass `fn` to `nn.Module._apply()` and `fn` is an in-place operation, the correct behavior should also include bumping the parameters' and their gradients' version counters. This PR fixes the old incorrect behavior and makes sure the new behavior is right.
Note that this PR is BC-breaking in the following way:
Previously, passing an in-place operation to `nn.Module._apply()` does not bump the module's parameters' and their gradients' version counters. After this PR, the module's parameters' and their gradients' version counters will be correctly bumped by the in-place operation, which will invalidate them in any autograd graph they previously participate in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21865
Differential Revision: D15881952
Pulled By: yf225
fbshipit-source-id: 62f9244a4283a110147e9f20145ff232a5579fbd