text-generation-inference
Inference support for GPTQ (llama + falcon tested) + Quantization script
#438
Merged

Commits
  • [WIP] Inference support for GPTQ (llama at least)
    Narsil committed 2 years ago
  • Removing dead code.
    Narsil committed 2 years ago
  • Fixing the dockerfile (require triton + gcc for compiling).
    Narsil committed 2 years ago
  • Typo.
    Narsil committed 2 years ago
  • Adding quantization scripts.
    Narsil committed 2 years ago
  • Functionning quantization script.
    Narsil committed 2 years ago
  • Some fixes.
    Narsil committed 2 years ago
  • Re-enabling dim=dim in TensorParallelColumn because llama.
    Narsil committed 2 years ago
  • Neox.
    Narsil committed 2 years ago
  • Fixing few things
    Narsil committed 2 years ago
  • Fixing register bias + gptq_bits type.
    Narsil committed 2 years ago
  • Falcon
    Narsil committed 2 years ago
  • Tiny fixes for falcon.
    Narsil committed 2 years ago
  • No one saw that, therefore it didn't happen.
    Narsil committed 2 years ago
  • Remove lots of dead code, move triton to hard requirement
    Narsil committed 2 years ago
  • Triton is actually a dependency of torch on linux.
    Narsil committed 2 years ago
  • Typo.
    Narsil committed 2 years ago
  • Santacoder GPTQ support (quantized model seems awful, not sure if it's
    Narsil committed 2 years ago
  • Apply suggestions from code review
    Narsil committed 2 years ago
Loading