[caffe2] L2 regularization for rowwise fused sparse adagrad (#37653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37653
Following up D21320243 adding weight_decay to rowwise fused sparse adagrad. This is more involved because we can't reuse g_sq_avg multiple times.
Test Plan: CI
Reviewed By: jspark1105
Differential Revision: D21335643
fbshipit-source-id: 491b385c5eb9c0d1e3d31a1cf50d7eb450c2d39d