DeepSpeed
e46d808a - MoE inference + PR-MoE model support (#1705)

Commit

3 years ago

MoE inference + PR-MoE model support (#1705) Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Zhewei Yao <zheweiy@berkeley.edu> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>

References

#1705 - MoE inference + PR-MoE model support

Author

jeffra

Parents

3293cf72

Files29

MANIFEST_win.in
csrc/transformer/inference
- csrc
  - gelu.cu
  - pt_binding.cpp
- includes
  - context.h
  - custom_cuda_layers.h
deepspeed
- __init__.py
- inference
  - engine.py
- module_inject
  - replace_module.py
  - replace_policy.py
- moe
  - experts.py
  - layer.py
  - sharded_moe.py
  - utils.py
- ops/transformer
  - __init__.py
  - inference
    - __init__.py
    - moe_inference.py
    - transformer_inference.py
- runtime
  - engine.py
  - fp16
    - fused_optimizer.py
  - utils.py
  - weight_quantizer.py
  - zero
    - stage_1_and_2.py
- utils
  - groups.py
op_builder
- __init__.py
- builder.py
setup.py
tests/unit
- test_checkpointing.py
- test_moe.py
version.txt

DeepSpeed e46d808a - MoE inference + PR-MoE model support (#1705)

DeepSpeed
e46d808a - MoE inference + PR-MoE model support (#1705)