DeepSpeed
f611c670
- bring back quantization and add different bits support
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Hide Minimap (CTRL+M)
Commit
1 year ago
bring back quantization and add different bits support
References
#4351 - DS-Inference Quantization refresh: Fix several issues and add more features
Author
Reza Yazdani
Parents
165042df
Files
11
csrc/transformer/inference
csrc
dequantize.cu
pt_binding.cpp
includes
inference_cuda_layers.h
deepspeed
model_implementations/transformers
ds_llama2.py
module_inject
auto_tp.py
containers
base.py
features
hybrid_engine.py
split_qkv.py
replace_module.py
ops/transformer/inference/op_binding
qkv_gemm.py
softmax_context.py
Loading