add the worker IDs outside of addSendRpcBackward to ensure they are (#30914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30914
When tensors don't require grad, we don't call `addSendRpcBackward`, where we record known workerIDs to clean up the dist autograd context later. But since https://github.com/pytorch/pytorch/pull/29781, we always include the autograd context ID in RPCs, even if tensors do not require grad. So, it could be possible that we don't release the contexts on some nodes.
This can contribute to OOMs since the contexts will not be cleaned up in this case, which can be checking by running the unit test without this patch. We can fix this issue by moving the `addKnownWorkerIds` call to the `getMessageWithAutograd` function.
ghstack-source-id: 95178561
Test Plan: Added a unit test: `test_context_cleanup_tensor_no_grad`
Differential Revision: D18869191
fbshipit-source-id: b80f66bfd0dd7d01960abe1691d3f44095bb1b2b