text-generation-inference
90a1d04a - Add support for GPTQ-quantized MoE models using MoE Marlin (#2557)

Commit

1 year ago

Add support for GPTQ-quantized MoE models using MoE Marlin (#2557) This change add support for MoE models that use GPTQ quantization. Currently only models with the following properties are supported: - No `desc_act` with tensor parallelism, unless `group_size=-1`. - No asymmetric quantization. - No AWQ.

References

#2557 - Add support for GPTQ-quantized MoE models using MoE Marlin

Author

danieldk

Parents

f9e561ec

text-generation-inference 90a1d04a - Add support for GPTQ-quantized MoE models using MoE Marlin (#2557)

text-generation-inference
90a1d04a - Add support for GPTQ-quantized MoE models using MoE Marlin (#2557)