Adds additional epsilon to adam for numerical stability. (#3091)
* Adds additional epsilon to adam for numerical stability.
Meta-gradients through the adam optimizer diverge, because the derivative
of the adam scaling with respect to the gradients get an additional
1/sqrt(g) factor. This additional factor is unregularized without the
second epsilon added in this commit.
* Renames eps2 to eps_root and improves docstring
Co-authored-by: Thomas Keck <thomaskeck@google.com>