flax
bb1f8073 - Add AdaBelief in flax.optim, which adapts stepsize according to "belief" in gradient, and achieves good generalization, fast convergence and training stability.

Commit

4 years ago

Add AdaBelief in flax.optim, which adapts stepsize according to "belief" in gradient, and achieves good generalization, fast convergence and training stability. Reference: [AdaBelief optimizer: adapting stepsizes by the belief in observed gradients](https://arxiv.org/abs/2010.07468) (Juntang Zhuang et al. NeurIPS 2020). PiperOrigin-RevId: 391187120

References

#1488 - Add AdaBelief in flax.optim, which adapts stepsize according to "belief" in gradient, and achieves good generalization, fast convergence and training stability.

Author

a-googler

Committer

a-googler

Parents

e30b7f5f

flax bb1f8073 - Add AdaBelief in flax.optim, which adapts stepsize according to "belief" in gradient, and achieves good generalization, fast convergence and training stability.

flax
bb1f8073 - Add AdaBelief in flax.optim, which adapts stepsize according to "belief" in gradient, and achieves good generalization, fast convergence and training stability.