DeepSpeed
c00388a2 - Mixtral FastGen Support (#4828)

Comment changes are shownComment changes are hidden
Commit
1 year ago
Mixtral FastGen Support (#4828) Adds support for Mixtral with FastGen. Key features implemented: 1. Top-2 MoE support 2. Better support for RoPE thetas 3. The mistral model implementation --------- Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Author
Parents
  • deepspeed/inference/v2
    • checkpoint
      • File
        huggingface_engine.py
    • File
      engine_factory.py
    • kernels/ragged_ops
      • File
        __init__.py
      • includes
        • File
          top_k_utils.h
      • linear_blocked_kv_rotary
        • File
          blocked_kv_rotary.cpp
        • File
          blocked_kv_rotary.cu
        • File
          blocked_kv_rotary.cuh
        • File
          blocked_kv_rotary.h
        • File
          blocked_kv_rotary.py
      • moe_gather
        • File
          moe_gather.cpp
        • File
          moe_gather.cu
        • File
          moe_gather.cuh
        • File
          moe_gather.h
        • File
          moe_gather.py
      • moe_scatter
        • File
          moe_scatter.cpp
        • File
          moe_scatter.cu
        • File
          moe_scatter.cuh
        • File
          moe_scatter.py
      • File
        ragged_ops.cpp
      • top_k_gating
        • File
          __init__.py
        • File
          top_k_gating.cpp
        • File
          top_k_gating.cu
        • File
          top_k_gating.cuh
        • File
          top_k_gating.h
        • File
          top_k_gating.py
    • model_implementations
      • File
        __init__.py
      • common_parameters
        • File
          moe_parameters.py
      • falcon
        • File
          __init__.py
        • File
          container.py
        • File
          model.py
        • File
          policy.py
      • File
        inference_transformer_base.py
      • llama_v2
        • File
          __init__.py
        • File
          container.py
        • File
          model.py
        • File
          policy.py
      • mistral
        • File
          model.py
        • File
          policy.py
      • mixtral
        • File
          __init__.py
        • File
          container.py
        • File
          model.py
        • File
          policy.py
      • opt
        • File
          container.py
        • File
          model.py
        • File
          policy.py
    • modules
      • configs
        • File
          __init__.py
        • File
          attention_configs.py
        • File
          moe_config.py
      • implementations
        • attention
          • File
            dense_blocked_attention.py
        • moe
          • File
            cutlass_multi_gemm.py
  • op_builder
    • File
      ragged_ops.py
  • tests/unit/inference/v2
    • kernels/ragged_ops
      • File
        test_moe_gather.py
      • File
        test_moe_scatter.py
      • File
        test_top_k_gating.py
    • model_implementations/parameters
      • File
        test_parameter_list.py
    • modules
      • File
        test_blocked_attn.py
      • File
        test_cutlass_moe.py
Loading