Refactor training loop.
- per_device_batch_size is very clear (batch_size is overloaded as a term) and makes it easy to scale from single devices to many machines.
- Let's standardize examples on eval_every_steps, checkpoint_every_steps etc.
- Moved eval and translation into separate methods to make the training loop shorter.
PiperOrigin-RevId: 349533009