text-generation-inference
Inference support for GPTQ (llama + falcon tested) + Quantization script
#438
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
19
Changes
View On
GitHub
Commits
[WIP] Inference support for GPTQ (llama at least)
Narsil
committed
2 years ago
Removing dead code.
Narsil
committed
2 years ago
Fixing the dockerfile (require triton + gcc for compiling).
Narsil
committed
2 years ago
Typo.
Narsil
committed
2 years ago
Adding quantization scripts.
Narsil
committed
2 years ago
Functionning quantization script.
Narsil
committed
2 years ago
Some fixes.
Narsil
committed
2 years ago
Re-enabling dim=dim in TensorParallelColumn because llama.
Narsil
committed
2 years ago
Neox.
Narsil
committed
2 years ago
Fixing few things
Narsil
committed
2 years ago
Fixing register bias + gptq_bits type.
Narsil
committed
2 years ago
Falcon
Narsil
committed
2 years ago
Tiny fixes for falcon.
Narsil
committed
2 years ago
No one saw that, therefore it didn't happen.
Narsil
committed
2 years ago
Remove lots of dead code, move triton to hard requirement
Narsil
committed
2 years ago
Triton is actually a dependency of torch on linux.
Narsil
committed
2 years ago
Typo.
Narsil
committed
2 years ago
Santacoder GPTQ support (quantized model seems awful, not sure if it's
Narsil
committed
2 years ago
Apply suggestions from code review
Narsil
committed
2 years ago
Loading