DeepSpeed
e669aaf5 - Trajepl/nebula ckpt engine (#2085)

Commit
3 years ago
Trajepl/nebula ckpt engine (#2085) * enable checkpoint engine * seprated nebula config * add __init__.py for nebula importing * linter fix * fix: ds_config is None * fix: ds config * fix: get sd loader fix * align the API with torch raw code * linter fix * remove duplicate tag params * make checkpoint_engine as required args * fix args * extract parameters out to config * fix: load state dict * separate load engine * linter fix * extract checkpoint engine to abstract calss * linter fix * construct function args fix * add docs for dev/customers * linter fix * remove load engine * print->log_dist * linter fix * add tag flag to distinguish the loading order Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Author
Parents
Loading