change LBFGS's default tolerance_grad to 1e-7 (#25240)
Summary:
Hi,
I noticed after v1.2.0 the implement of LBFGS optimizer has been changed. In this new implement, the return condition has been changed from the sum of the gradients to the max value in the gradients (see: https://github.com/pytorch/pytorch/blob/b15d91490aa4277d5bc8dce033261092b239134b/torch/optim/lbfgs.py#L313). But the default tolerance_grad parameter has not been changed (which is too large for max of gradients), so this result in lots of my old codes not optimizing or only optimizing for one or two steps.
So, I came up this pull request to suggest that changing this tolerance_grad to a smaller value
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25240
Differential Revision: D17102713
Pulled By: vincentqb
fbshipit-source-id: d46acacdca1c319c1db669f75da3405a7db4a7cb