feat(server): Add exllama GPTQ CUDA kernel support #553 #666
add exllama gptq kernel
ee7ba48b
add attribution
c858d791
Merge branch 'main' into gptq-cuda-kernels
0ff8219f
some more cleanup
2272b3a4
Merge branch 'gptq-cuda-kernels' of https://github.com/fxmarty/text-g…
620ed7d8
try-catch to load the cuda extension, quite ugly practice tbh
a6e38740
have a single gptq quantization type
4462854e
move exllama buffer init to the top level
67a46b73
cleanup
67d68760
support bits different than 4
f90c61a3
tests
8645fd39
Merge branch 'main' into gptq-cuda-kernels
faa5b52f
fix test
38c2be59
fix tests
2ae65b45
support all, test llama
00360842
Merge branch 'main' into gptq-cuda-kernels
9401e102
fix the usual merge mess
74e6d6e5
Merge branch 'main' into gptq-cuda-kernels
edfbfdfb
fix per-column quantization
6bf7090e
Refactored a bit.
08603944
Small polish.
8cf7c899
Give escape hatch to not use exllama kernels even if available.
7faef690
Fixing GTPQ device santacoder.
900ac494
Fix config.
12191b7e
Add kernel target.
c6e702fb
Separate build process.
3ec3adde
Update starcoder_gptq
40be5328
Wtf gh.
1dc952a6
Switching model for integration test llama gptq.
8b6a2625
Getting closer to the non gptq test (stop sequence doesn't work).
afb39404
Narsil
changed the title Superseeds #553 feat(server): Add exllama GPTQ CUDA kernel support #553 2 years ago
Narsil
merged
d5b5bc75
into main 2 years ago
Narsil
deleted the gptq-cuda-kernels2 branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub