Add thread local state guards in autograd engine hooks. (#60067)
Summary:
The thread local state of backward thread is not aligned to the GraphTask's `thread_local_` when calling the hooks in backward.
This is required for profiling the statistics c10d operation of `DistributedDataParallel` module.
Is there any concern to add the thread local state guard when calling the hooks in backward? ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60067
Reviewed By: ezyang
Differential Revision: D29654599
Pulled By: albanD
fbshipit-source-id: 656c4f91017184fd40f1a184de24757a13387e37