[JIT] Autodiff - use more accurate requires_grad info
When autodiff is constructing the Gradient object, it looks at the
forward graph and records all the outputs that requires_grad into
df_input_vjps. Then at runtime, graph_executor.cpp will detach the
tensors before running the autodiff forward graph, and then add
requires_grad back onto the outputs if they need requires_grad.
Before, the require_grad check was done by just checking
`output->requires_grad()`. But at the point when autodiff is called by
profiling executor, the profiled information is still in the profile
nodes, not on values. So requires_grad would not be set on the output
values, and requires_grad() would default to True on all tensors. As a
result more output tensors than expected would require_grad.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78392
Approved by: https://github.com/eellison