Fix Zero3 contiguous grads, reduce scatter false accuracy issue (#4321)
it is a corner case met when running Zero3 on flan-t5 HF model.
Where HF auto bucket size becomes exactly the same size if hidden_size^2
This is the exact size of some of the params. due to the bug the params
are not being reduced during backward.
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>