Implements Adabelief - an adaptive method that modifies the stpesize according
to the beleif in the current gradient prediction. If the observed gradient
deviates considerably from the prediciont, a small step is taken. If the
prediction and the observed graident agree, then a large step is taken.
Reference: [Adabelief Optimizer: Adapting Stepsizes by the Belief in Ovserved
Gradients](https://https://arxiv.org/pdf/2010.07468.pdf) (Zhuang et al, 2020).
PiperOrigin-RevId: 355497424