jax
007cdf2f - Adds additional epsilon to adam for numerical stability. (#3091)

Commit
5 years ago
Adds additional epsilon to adam for numerical stability. (#3091) * Adds additional epsilon to adam for numerical stability. Meta-gradients through the adam optimizer diverge, because the derivative of the adam scaling with respect to the gradients get an additional 1/sqrt(g) factor. This additional factor is unregularized without the second epsilon added in this commit. * Renames eps2 to eps_root and improves docstring Co-authored-by: Thomas Keck <thomaskeck@google.com>
Author
Parents
Loading