Only populate grad accumulator to var mapping for find_unused_parameters=True in DDP (#45942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45942
We only need to keep track of this for traversing the autograd graph
when find_unused_parameters=True. Without that, we populate and keep this
mapping in memory, which occupies sizeof(pointer) * number of grad accumulators
of extra memory.
ghstack-source-id: 114219289
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D24154407
fbshipit-source-id: 220d723e262f36590a03a3fd2dab47cbfdb87d40