huggingface/text-generation-inference

Pull Requests Commits

OlivierDehaene committed 2 years ago

a4fd6905

feat(server): use encoding to get prefill tokens

OlivierDehaene committed 2 years ago

83e442ca

fix(server): fix warpers on CPU (#472)

OlivierDehaene committed 2 years ago

Verified 53aa9194

feat(server): improve flash attention import errors (#465)

OlivierDehaene committed 2 years ago

Verified ece7ffa4

feat(router): add ngrok integration (#453)

OlivierDehaene committed 2 years ago

Verified f59fb8b6

feat(server): pre-allocate past key values for flash causal LM (#412)

OlivierDehaene committed 2 years ago

Verified 5ce89059

fix(makefile): Fix typo and use POSIX comparison in the makefile (#443)

piratos committed 2 years ago

Verified ca650e5b

docs(launcher): fix CUDA_VISIBLE_DEVICES helper comment (#441)

antferdom committed 2 years ago

Verified d4eb60f4

feat(server): optimize dist ops (#434)

OlivierDehaene committed 2 years ago

Verified e496c9ba

feat(server): Rework model loading (#344)

Narsil committed 2 years ago

Verified abd58ff8

chore: update openapi schema

OlivierDehaene committed 2 years ago

19c41824

feat(server): batch tokenization for flash causal lm (#411)

OlivierDehaene committed 2 years ago

Verified 6abec14a

feat(server): only compute prefill logprobs when asked (#406)

OlivierDehaene committed 2 years ago

Verified 895c5f15

feat(launcher): parse oom signal (#404)

OlivierDehaene committed 2 years ago

Verified 83b84486

feat(sagemaker): add trust remote code to entrypoint (#394)

OlivierDehaene committed 2 years ago

Verified 62fc4010

OlivierDehaene committed 2 years ago

e7248fe9

feat(server): load santacoder/starcoder models with safetensors (#393)

OlivierDehaene committed 2 years ago

Verified 95d35469

feat(server): remove trust_remote_code requirement for falcon models (#396)

OlivierDehaene committed 2 years ago

Verified c0928e6f

fix(server): fix has_position_ids (#395)

OlivierDehaene committed 2 years ago

Verified d69a0633

OlivierDehaene committed 2 years ago

db2ebe39

fix(server): fix bnb quantization for CausalLM models (#385)

OlivierDehaene committed 2 years ago

Verified 337afb28

feat(server): add retry on download (#384)

OlivierDehaene committed 2 years ago

Verified 87dc034b

increase health checks

OlivierDehaene committed 2 years ago

444400b4

OlivierDehaene committed 2 years ago

081b9265

feat(server): support RefinedWeb models (#379)

OlivierDehaene committed 2 years ago

Verified b8b950b3

fix(server): fix quantization

OlivierDehaene committed 2 years ago

bf7f1d54

fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES

OlivierDehaene committed 2 years ago

49a6c8c1

fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES

OlivierDehaene committed 2 years ago

146e72c3

Fix issue when load AutoModelForSeq2SeqLM model (#370)

CL-Shang committed 2 years ago

Verified 5fde8d99

feat(server): support vectorized warpers in flash causal lm (#317)

OlivierDehaene committed 2 years ago

Verified 62f91f78

Older