[dist_optim] fix the bug of none grads on functional optimizers (#62249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62249
parameter and grads passed to torch.optim.functional should always match, we should skip the parameters that have none gradients to avoid the size mismatch
ghstack-source-id: 134452467
Test Plan: test_dist_optim_none_grads
Reviewed By: mrshenli
Differential Revision: D29929653
fbshipit-source-id: 4ca6167fecdfe1db422236655edee3aa59b8b044