PR #666 feat(server): Add exllama GPTQ CUDA kernel support #553

feat(server): Add exllama GPTQ CUDA kernel support #553 #666

Narsil merged 30 commits into main from gptq-cuda-kernels2

add exllama gptq kernel

ee7ba48b

add attribution

c858d791

Merge branch 'main' into gptq-cuda-kernels

0ff8219f

some more cleanup

2272b3a4

Merge branch 'gptq-cuda-kernels' of https://github.com/fxmarty/text-g…

620ed7d8

try-catch to load the cuda extension, quite ugly practice tbh

a6e38740

have a single gptq quantization type

4462854e

move exllama buffer init to the top level

67a46b73

cleanup

67d68760

support bits different than 4

f90c61a3

tests

8645fd39

Merge branch 'main' into gptq-cuda-kernels

faa5b52f

fix test

38c2be59

fix tests

2ae65b45

support all, test llama

00360842

Merge branch 'main' into gptq-cuda-kernels

9401e102

fix the usual merge mess

74e6d6e5

Merge branch 'main' into gptq-cuda-kernels

edfbfdfb

fix per-column quantization

6bf7090e

Refactored a bit.

08603944

Small polish.

8cf7c899

Give escape hatch to not use exllama kernels even if available.

7faef690

Fixing GTPQ device santacoder.

900ac494

Narsil requested a review from

OlivierDehaene 2 years ago

Fix config.

12191b7e

Add kernel target.

c6e702fb

Separate build process.

3ec3adde

Update starcoder_gptq

40be5328

Wtf gh.

1dc952a6

Switching model for integration test llama gptq.

8b6a2625

Getting closer to the non gptq test (stop sequence doesn't work).

afb39404

Narsil changed the title ~~Superseeds #553~~ feat(server): Add exllama GPTQ CUDA kernel support #553 2 years ago

OlivierDehaene approved these changes on 2023-07-21

Narsil merged d5b5bc75 into main 2 years ago

Narsil deleted the gptq-cuda-kernels2 branch 2 years ago

Reviewers

OlivierDehaene

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

text-generation-inference feat(server): Add exllama GPTQ CUDA kernel support #553 #666 Merged

feat(server): Add exllama GPTQ CUDA kernel support #553 #666

text-generation-inference
feat(server): Add exllama GPTQ CUDA kernel support #553
#666

Merged