Integrate RAdam to SparseAdamOp
Summary:
T53944549 aims to integrate [`RAdam`](https://arxiv.org/pdf/1908.03265.pdf) optimizer to `Adam`. In this diff, we first try to integrate `RAdam` to `SparseAdamOp` on CPU platform.
Note that `adam_op.cc` and `adam_op_gpu.cu` may be implemented in other diffs.
The implementation of `RAdam` follows the algorithm below:
{F220259279}
The algorithm of [`Adam`](https://arxiv.org/pdf/1412.6980.pdf) is attached:
{F220389971}
Test Plan: Run `buck build caffe2` successfully.
Reviewed By: wx1988
Differential Revision: D18239578
fbshipit-source-id: fdc028261ee20986cae1f30f1d26d8705587331a