Group tensorboard metrics (#39)
* Training groupings
* validation grouping
* steps vs samples
* iteration time (speed -> samples or iterations per second)
* tensorboard group time (from `log_timers_to_tensorboard`)
* comment on the writing condition
* Update megatron/global_vars.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update megatron/training.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update megatron/training.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update megatron/training.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update megatron/training.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* link bug fix issue on megatron-lm side
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>