Megatron-DeepSpeed
af8229e2 - Floating-point ops counting and reloading (#40)

Commit

4 years ago

Floating-point ops counting and reloading (#40) * initial flo count/logging setup (need to fix model parameter count) * initial flo count/logging setup (need to fix model parameter count) * 1B3 parameter setup + flos counting * 1B3 parameter setup + flos counting * 1B3 parameter setup + flos counting * 1B3 parameter setup * 1B3 parameter setup * synched with latest 13B script * synched with latest 13B script * pipe transformer docstring * improve DS integration evaluation + logging * use pp engine even for pp=1 (#6) * removed slurm_examples * flos re-loading * Update megatron/training.py Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com> * Update megatron/data/gpt_dataset.py Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com> * Update megatron/utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Update megatron/utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * formatting fix, reserving bug for somewhere else, adding flo-logging to TB groups * indentation bug * fixing possible double counts * tweaks * warning for double counts Co-authored-by: Shaden Smith <shaden.smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: TevenLeScao <uhk85as@jean-zay1.idris.fr> Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

References

#40 - Floating-point ops counting and reloading

Author

TevenLeScao

Parents

7cd2b85a

Megatron-DeepSpeed af8229e2 - Floating-point ops counting and reloading (#40)

Megatron-DeepSpeed
af8229e2 - Floating-point ops counting and reloading (#40)