Integrate multi_tensor zero_grad into Optimizer base class (#69936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69936
Currently, the optimizers in `torch/optim/_multi_tensor/` all override the base Optimizer class' implementation of `zero_grad` with the same foreach zero_grad implementation (e.g. [here](https://github.com/pytorch/pytorch/blob/master/torch/optim/_multi_tensor/adadelta.py#L93-L114)). There is a TODO that indicates that this should be refactored to the base class once the foreach ops are in good shape. This PR is intended to address that TODO.
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision: D33346748
Pulled By: mikaylagawarecki
fbshipit-source-id: 6573f4776aeac757b6a778894681868191a1b4c7