Make batch size documentation clearer (#5072)
The config variable for accumulation steps is
```gradient_accumulation_steps``` but the docs explaining batch size
related parameters state it as ```gradient_accumulation``` in the note
at the top. This could lead to misconfiguration if someone uses this
note as their reference for configuration, and it makes the docs less
clear to read because it is not necessarily obvious that
```gradient_accumulation``` actually refers to
```gradient_accumulation_steps```.
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>