flax
bb1f8073 - Add AdaBelief in flax.optim, which adapts stepsize according to "belief" in gradient, and achieves good generalization, fast convergence and training stability.

Commit
4 years ago
Add AdaBelief in flax.optim, which adapts stepsize according to "belief" in gradient, and achieves good generalization, fast convergence and training stability. Reference: [AdaBelief optimizer: adapting stepsizes by the belief in observed gradients](https://arxiv.org/abs/2010.07468) (Juntang Zhuang et al. NeurIPS 2020). PiperOrigin-RevId: 391187120
Author
Committer
Parents
Loading