[JIT] Propagate requires_grad to autodiff subgraphs (#71666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71666
When JIT autodiff is constructing a gradient computation graph, it will only add gradients for tensors that require_grad. Previously, require_grad information was **not** propagated to the subgraph that autodiff used; as a result, autodiff would calculate *all* gradients, even if requires_grad had never been set during profiling runs. In certain cases, this can lead to performance issues. For example, during training, the gradient of the input data is not needed, but is still computed.
This propagates requires_grad to the subgraph passed into autodiff, so that autodiff will not compute unnecessary gradients.
Test: `./bin/test_jit --gtest_filter="AutodiffRemoveUnusedGradientsTest.Linear"`
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D33725304
Pulled By: davidberard98
fbshipit-source-id: ca7ab4c9a6a26f94f93aff2d5a4135e125323ba1
(cherry picked from commit a97fe0556da1d74d04250c7cbcd1b8e9d8b41ebe)