huggingface/text-generation-inference

Pull Requests Commits

Reducing number of reps while autotuning.

Ubuntu committed 2 years ago

fb084094

OlivierDehaene committed 2 years ago

7de104b7

OlivierDehaene committed 2 years ago

a794c677

Working version.

Ubuntu committed 3 years ago

a86e4bf7

Tmp work for sharding to work properly.

Ubuntu committed 3 years ago

57a6cbff

Ubuntu committed 3 years ago

c5846ee7

Non local file.

Ubuntu committed 3 years ago

c126ca01

Some protection against sharding (illegal access becuase of g_idx)

Ubuntu committed 3 years ago

c3d12ae2

[WIP] Adding GPTQ support for llama

Ubuntu committed 3 years ago

2c9e1171

fix(server): fix multinomial implem in Sampling

OlivierDehaene committed 3 years ago

4f6d038c

feat(server): use cuda graph in logits warping (#302)

OlivierDehaene committed 3 years ago

Verified a6c18c39

fix(docker): remove CUDA_VERSION

OlivierDehaene committed 3 years ago

35ab6cfc

feat(server): use float16 (#304)

OlivierDehaene committed 3 years ago

Verified 745f596c

feat(server): shard token decode (#303)

OlivierDehaene committed 3 years ago

Verified 68e9d6ab

fix(docker): remove nvidia require cuda env (#310)

OlivierDehaene committed 3 years ago

Verified 15854044

fix(docker): fix nvidia env vars (#305)

OlivierDehaene committed 3 years ago

Verified 49cffad1

feat(server): optim flash causal lm decode_token (#285)

OlivierDehaene committed 3 years ago

Verified ad66f6ef

fix(docker): fix docker build (#299)

OlivierDehaene committed 3 years ago

Verified bc5c0723

feat(docker): add benchmarking tool to docker image (#298)

OlivierDehaene committed 3 years ago

Verified e2502822

feat(router): Adding response schema for compat_generate (#292)

gsaivinay committed 3 years ago

Verified 926fd9a0

fix(dockerfile): fix nvidia env vars (#297)

OlivierDehaene committed 3 years ago

Verified e9b01b34

fea(server): decrease convert RAM requirements (#286)

Narsil committed 3 years ago

Verified b4aa87db

chore: add `flash-attention` to docker ignore (#287)

Narsil committed 3 years ago

Verified 3314a46d

fix(server): fix convert (#284)

Narsil committed 3 years ago

Verified 690fc317

feat(launcher): Improve error message when download process fails. (#276)

Narsil committed 3 years ago

Verified e68509ad

fix(server): Removes the parallelism in file convertion (during download) (#275)

Narsil committed 3 years ago

Verified f08343d4

fix(launcher): handle hub branches (#278)

Narsil committed 3 years ago

Verified b4fe248b

fix(launcher): pass weights cache override to the download process (#274)

OlivierDehaene committed 3 years ago

Verified b67908e0

feat(server): support hf endpoint weight layout (#266)

OlivierDehaene committed 3 years ago

Verified 85aa7e2e

fix(server): fix typo in tokenizers decode (#269)

OlivierDehaene committed 3 years ago

Verified 4096000e

Older