optimum
53240c3f - Allow GPTQModel to auto select Marlin or faster kernels for inference only ops (#2138)

Commit

340 days ago

Allow GPTQModel to auto select Marlin or faster kernels for inference only ops (#2138) * select quant_linear with pack * up GPTQMODEL_MINIMUM_VERSION * Update quantizer.py * update gptqmodel version --------- Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

Author

LRL-ModelCloud

Parents

72498dde

optimum 53240c3f - Allow GPTQModel to auto select Marlin or faster kernels for inference only ops (#2138)

optimum
53240c3f - Allow GPTQModel to auto select Marlin or faster kernels for inference only ops (#2138)