Update Adam documentation (#41679)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/41477
Adam implementation is doing L2 regularization and not decoupled weight decay. However, the change mentioned in https://github.com/pytorch/pytorch/issues/41477 was motivated by Line 12 of algorithm 2 in [Decoupled Weight Decay Regularization](https://arxiv.org/pdf/1711.05101.pdf) paper.
Please let me know if you have other suggestions about how to deliver this info in the docs.
cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41679
Reviewed By: izdeby
Differential Revision: D22671329
Pulled By: vincentqb
fbshipit-source-id: 2caf60e4f62fe31f29aa35a9532d1c6895a24224