[caffe2] SWA operator (#34394)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34394
# SWA operator
In this diff, we added a new operator `SWA` which will be used in `AdaGradOptimizer`.
The algorithm looks like:
{F230902995}
# Background
In our testings, we found that this operator could improve our models' reproducibility a lot. (KT: 0.86 -> .92)
So we hope to land this operator and in future, enable this by default in our Models.
Test Plan:
Local build `aml.dper3:30f068668cfb408fbb40141fb17129f2` and bento kernel.
- Local test: n215857
- f174600345
Reviewed By: chocjy
Differential Revision: D20165239
fbshipit-source-id: c03cdd048cb10b091e5f06323f4c0f3999f95d8a