DeepSpeed
54110305 - Inference Checkpoints in V2 (#4664)

Comment changes are shownComment changes are hidden
Commit
1 year ago
Inference Checkpoints in V2 (#4664) Add capability to snapshot an engine and resume from it, reducing load times for large models. Includes new unit tests to validate this pipeline on a small scale. --------- Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Masahiro Tanaka <mtanaka@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Author
Parents
  • .github/workflows
    • File
      nv-accelerate-v100.yml
    • File
      nv-inference.yml
    • File
      nv-lightning-v100.yml
    • File
      nv-megatron.yml
    • File
      nv-pre-compile-ops.yml
    • File
      nv-torch-latest-cpu.yml
    • File
      nv-torch-latest-v100.yml
    • File
      nv-transformers-v100.yml
  • deepspeed/inference
    • File
      __init__.py
    • v2
      • File
        __init__.py
      • File
        allocator.py
      • checkpoint
        • File
          huggingface_engine.py
      • File
        engine_factory.py
      • File
        engine_v2.py
      • File
        inference_parameter.py
      • model_implementations
        • File
          __init__.py
        • common_architectures
          • File
            __init__.py
        • common_parameters
          • File
            embedding_parameters.py
          • File
            invfreq_parameters.py
          • File
            moe_parameters.py
          • File
            qkv_parameters.py
        • File
          flat_model_helpers.py
        • File
          inference_model_base.py
        • File
          inference_policy_base.py
        • File
          inference_transformer_base.py
        • File
          layer_container_base.py
        • llama_v2
          • File
            __init__.py
          • File
            llama_v2_containers.py
          • File
            llama_v2_policy.py
        • mistral
          • File
            __init__.py
          • File
            policy.py
        • opt
          • File
            __init__.py
          • File
            policy.py
        • File
          parameter_base.py
      • modules
        • implementations
          • linear
            • File
              __init__.py
            • File
              blas_fp_linear.py
            • File
              cutlass_fp_linear.py
          • moe
            • File
              cutlass_multi_gemm.py
            • File
              gate_fn.py
            • File
              test.py
          • post_norm
            • File
              cuda_post_ln.py
          • pre_norm
            • File
              cuda_pre_ln.py
            • File
              cuda_pre_rms.py
        • interfaces
          • File
            embedding_base.py
          • File
            linear_base.py
          • File
            moe_base.py
          • File
            post_norm_base.py
          • File
            pre_norm_base.py
      • ragged/csrc
        • File
          ragged_ops.cpp
  • tests/unit/inference
    • kernels
      • File
        __init__.py
      • core_ops
        • File
          __init__.py
        • File
          test_bias_activation.py
        • File
          test_blas_linear.py
        • File
          test_gated_activation.py
        • File
          test_post_ln.py
        • File
          test_pre_ln.py
        • File
          test_rms_norm.py
      • cutlass_ops
        • File
          __init__.py
        • File
          test_moe_gemm.py
      • ragged_ops
        • File
          __init__.py
        • File
          ragged_testing_utils.py
        • File
          test_atom_builder.py
        • File
          test_blocked_flash.py
        • File
          test_blocked_kv_copy.py
        • File
          test_blocked_rotary_emb.py
        • File
          test_logits_gather.py
        • File
          test_moe_gather.py
        • File
          test_moe_scatter.py
        • File
          test_ragged_embed.py
        • File
          test_top_1_gating.py
    • model_implementations
      • File
        __init__.py
      • parameters
        • File
          __init__.py
        • File
          test_layer_inheritance.py
        • File
          test_mapping.py
        • File
          test_multi_parameter_layer.py
        • File
          test_parameter_list.py
        • File
          utils.py
      • sharding
        • File
          __init__.py
        • File
          test_attn_out_sharding.py
        • File
          test_mlp_sharding.py
        • File
          test_qkv_sharding.py
    • modules
      • File
        __init__.py
      • File
        test_blas_linear_module.py
      • File
        test_blocked_attn.py
      • File
        test_cuda_pre_ln_module.py
      • File
        test_custom_module.py
      • File
        test_cutlass_moe.py
      • File
        test_post_ln_module.py
      • File
        test_pre_rms_module.py
    • ragged
      • File
        test_blocked_allocator.py
      • File
        test_manager_configs.py
      • File
        test_ragged_wrapper.py
    • v2/model_implementations/parameters
      • File
        test_contiguify.py
      • File
        test_layer_inheritance.py
      • File
        test_mapping.py
      • File
        test_multi_parameter_layer.py
      • File
        test_parameter_list.py
      • File
        utils.py