Remove `finalize_bucket_sparse` from DDP (#40130)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40130
The sparse gradients for the model and the tensor that is used to
perform allreduce in DDP are essentially the same and have the same storage. As
a result, once allreduce is done, the sparse gradients are automatically
updated and unlike dense gradients we don't need to assign the bucket's
contents back to the grad.
In addition to this, I've also added a test for distributed autograd to ensure
it works correctly for sparse gradients. I discovered `finalize_bucket_sparse`
was redundant as part of this test since it passed without any changes needed
to `finalize_bucket_sparse` which only looked at the `.grad` field.
ghstack-source-id: 106090063
Test Plan: waitforbuildbot
Differential Revision: D22080004
fbshipit-source-id: 493ce48b673f26b55dffd6894a3915dc769839f6