use ```non_reentrant_checkpoint``` fix requires_grad of input must be true for activation checkpoint layer in pipeline train. (#4224)
* feat: add `non_reentrant_checkpoint`
* feat: add missing output postprocess and change the hook to record leaf forward tensor refs
* fix: make the multi_grad_hook registered after graph construction
* fix: backward compatibility for multi_tensor_hook
* fix: nonlocal reference error of deepspeed_saved_tensors
* fix: reduce repeating hook registration
* test: add test for `activation_checkpointing.checkpointing.non_reentrant_checkpoint`
* Pass correct node size for ZeRO++ (#4085)
* Pass correct node size
* formatting
---------
Co-authored-by: Connor Holmes <development@cmikeh2.me>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* add deepspeed chat arxiv report (#4110)
* add deepspeed chat arxiv report
* add zeroquant v2 and fp
* add selective enhencement
* add ignore for 'Youn' in spell checker
---------
Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* style: change flake8 detected style missmatch
* test: hack to clone the `test_activation_checkpointing` module for reuse and add regression tests
* doc: explain the introduction of `non_reentrant_checkpoint`
* doc: explain the test of `non_reentrant_checkpoint`
* apply non_reentrant_checkpoint in pipeline parallel training
* ut pass
* fix ci
* reduce check level for ci
---------
Co-authored-by: hughpu <hughpu@hotmail.com>
Co-authored-by: Hugh Pu <31498041+hughpu@users.noreply.github.com>
Co-authored-by: Connor Holmes <connorholmes@microsoft.com>
Co-authored-by: Connor Holmes <development@cmikeh2.me>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Conglong Li <conglong.li@gmail.com>
Co-authored-by: yaozhewei <zheweiy@berkeley.edu>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>