DeepSpeed
60a3e89e - use ```non_reentrant_checkpoint``` fix requires_grad of input must be true for activation checkpoint layer in pipeline train. (#4224)

Commit
2 years ago
use ```non_reentrant_checkpoint``` fix requires_grad of input must be true for activation checkpoint layer in pipeline train. (#4224) * feat: add `non_reentrant_checkpoint` * feat: add missing output postprocess and change the hook to record leaf forward tensor refs * fix: make the multi_grad_hook registered after graph construction * fix: backward compatibility for multi_tensor_hook * fix: nonlocal reference error of deepspeed_saved_tensors * fix: reduce repeating hook registration * test: add test for `activation_checkpointing.checkpointing.non_reentrant_checkpoint` * Pass correct node size for ZeRO++ (#4085) * Pass correct node size * formatting --------- Co-authored-by: Connor Holmes <development@cmikeh2.me> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> * add deepspeed chat arxiv report (#4110) * add deepspeed chat arxiv report * add zeroquant v2 and fp * add selective enhencement * add ignore for 'Youn' in spell checker --------- Co-authored-by: yaozhewei <zheweiy@berkeley.edu> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> * style: change flake8 detected style missmatch * test: hack to clone the `test_activation_checkpointing` module for reuse and add regression tests * doc: explain the introduction of `non_reentrant_checkpoint` * doc: explain the test of `non_reentrant_checkpoint` * apply non_reentrant_checkpoint in pipeline parallel training * ut pass * fix ci * reduce check level for ci --------- Co-authored-by: hughpu <hughpu@hotmail.com> Co-authored-by: Hugh Pu <31498041+hughpu@users.noreply.github.com> Co-authored-by: Connor Holmes <connorholmes@microsoft.com> Co-authored-by: Connor Holmes <development@cmikeh2.me> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: yaozhewei <zheweiy@berkeley.edu> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Author
Parents
Loading