text-generation-inference
feat(server): Add exllama GPTQ CUDA kernel support #553
#666
Merged

feat(server): Add exllama GPTQ CUDA kernel support #553 #666

Narsil merged 30 commits into main from gptq-cuda-kernels2
Narsil
fxmarty add exllama gptq kernel
ee7ba48b
fxmarty add attribution
c858d791
fxmarty Merge branch 'main' into gptq-cuda-kernels
0ff8219f
fxmarty some more cleanup
2272b3a4
fxmarty Merge branch 'gptq-cuda-kernels' of https://github.com/fxmarty/text-g…
620ed7d8
fxmarty try-catch to load the cuda extension, quite ugly practice tbh
a6e38740
fxmarty have a single gptq quantization type
4462854e
fxmarty move exllama buffer init to the top level
67a46b73
fxmarty cleanup
67d68760
fxmarty support bits different than 4
f90c61a3
fxmarty tests
8645fd39
fxmarty Merge branch 'main' into gptq-cuda-kernels
faa5b52f
fxmarty fix test
38c2be59
fxmarty fix tests
2ae65b45
fxmarty support all, test llama
00360842
fxmarty Merge branch 'main' into gptq-cuda-kernels
9401e102
fxmarty fix the usual merge mess
74e6d6e5
fxmarty Merge branch 'main' into gptq-cuda-kernels
edfbfdfb
fxmarty fix per-column quantization
6bf7090e
Narsil Refactored a bit.
08603944
Narsil Small polish.
8cf7c899
Narsil Give escape hatch to not use exllama kernels even if available.
7faef690
Narsil Fixing GTPQ device santacoder.
900ac494
Narsil Narsil requested a review from OlivierDehaene OlivierDehaene 2 years ago
Narsil Fix config.
12191b7e
Narsil Add kernel target.
c6e702fb
Narsil Separate build process.
3ec3adde
Narsil Update starcoder_gptq
40be5328
Narsil Wtf gh.
1dc952a6
Narsil Switching model for integration test llama gptq.
8b6a2625
Narsil Getting closer to the non gptq test (stop sequence doesn't work).
afb39404
Narsil Narsil changed the title Superseeds #553 feat(server): Add exllama GPTQ CUDA kernel support #553 2 years ago
OlivierDehaene
OlivierDehaene approved these changes on 2023-07-21
Narsil Narsil merged d5b5bc75 into main 2 years ago
Narsil Narsil deleted the gptq-cuda-kernels2 branch 2 years ago
Atry
aoyifei
Atry
OlivierDehaene
BEpresent

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone