flax
637b9f6c - Add LAMB optimizer

Commit
6 years ago
Add LAMB optimizer This is similar to LARS but with adam instead of momentum as the wrapped update rule (and a couple other differences). It's popular for large-batch transformer training.
Author
Parents
Loading