Add AdaBelief in flax.optim, which adapts stepsize according to "belief" in gradient, and achieves good generalization, fast convergence and training stability.
Add AdaBelief in flax.optim, which adapts stepsize according to "belief" in gradient, and achieves good generalization, fast convergence and training stability. #1488
copybara-service
changed the title adabelief in flax Add AdaBelief in flax.optim, which adapts stepsize according to "belief" in gradient, and achieves good generalization, fast convergence and training stability.4 years ago
Login to write a write a comment.
Login via GitHub