To add Rectified Adam algorithm for multi-tensor optimizers API (#59161)
Summary:
Previously in the PR: https://github.com/pytorch/pytorch/issues/58968 we added RAdam to Optimizers. Here in this PR we are proposing multi-tensor version of RAdam for PyTorch.
Radam has been proposed in the paper https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al.
It has been one of the most used algorithm in Deep Learning community.
Differing from the paper, we selected variance tractability cut-off as 5 instead of 4 as it is the common practice.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59161
Reviewed By: vincentqb
Differential Revision: D29360576
Pulled By: iramazanli
fbshipit-source-id: 7ccdbf12b1ee7f12e66f7d7992123a70cc818b6b