To add Nesterov Adam algorithm for multi-tensor optimizers API (#59165)
Summary:
Previously in the PR: https://github.com/pytorch/pytorch/issues/59009 we added NAdam to Optimizers. Here in this PR we are proposing multi-tensor version of NAdam for PyTorch.
Nadam has been proposed in the paper https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ and report and report : http://cs229.stanford.edu/proj2015/054_report.pdf by Timothy Dozat.
It has been one of the most used algorithm in Deep Learning community.
It worth to noting that the implementation of NAdam is inspired by the implementation for Keras :
https://github.com/tensorflow/tensorflow/blob/f9d386849581d15d72f6f1f96f12aac230a8edbe/tensorflow/python/keras/optimizer_v2/nadam.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59165
Reviewed By: vincentqb
Differential Revision: D29360577
Pulled By: iramazanli
fbshipit-source-id: 0fe14016303b2df2cb8cc31912a2674acf63d1e5