DeepSpeed
f611c670 - bring back quantization and add different bits support

Commit
1 year ago
bring back quantization and add different bits support
Author
Reza Yazdani
Parents
  • csrc/transformer/inference
    • csrc
      • File
        dequantize.cu
      • File
        pt_binding.cpp
    • includes
      • File
        inference_cuda_layers.h
  • deepspeed
    • model_implementations/transformers
      • File
        ds_llama2.py
    • module_inject
      • File
        auto_tp.py
      • containers
        • File
          base.py
        • features
          • File
            hybrid_engine.py
          • File
            split_qkv.py
      • File
        replace_module.py
    • ops/transformer/inference/op_binding
      • File
        qkv_gemm.py
      • File
        softmax_context.py