huggingface/text-generation-inference

Pull Requests Commits

delete the last no repeat processor from warpers

ErikKaum committed 1 year ago

12381b0b

satisfy compiler

ErikKaum committed 1 year ago

e29fc9e3

make nrns optional

Nathan Brake committed 1 year ago

01e61bf0

Nathan Brake committed 1 year ago

ae46faef

Add support for no_repeat_ngram_size

Nathan Brake committed 1 year ago

28e6a504

Use symmetric quantization in the `quantize` subcommand (#2120)

danieldk committed 1 year ago

Verified dbb23fbf

[fix] Modifying base in yarn embedding (#2212)

SeongBeomLEE committed 1 year ago

Verified c46eaf70

fix: append DONE message to chat stream (#2221)

drbh committed 1 year ago

Verified d789de32

Add support for FP8 on compute capability >=8.0, <8.9 (#2213)

danieldk committed 1 year ago

Verified cb150eb2

Move quantized weight handling out of the `Weights` class (#2194)

danieldk committed 1 year ago

Verified 8511669c

Updating the self check (#2209)

Narsil committed 1 year ago

Verified 4c976fb4

Fixed README ToC (#2196)

vinkamath committed 1 year ago

Verified f5ba9bfd

Adding sanity check to openapi docs.

Narsil committed 1 year ago

Verified fe710af2

Fix buildx cache + change runner type (#2176)

glegendre01 committed 1 year ago

Verified 5e2a3058

Fix nccl regression on PyTorch 2.3 upgrade (#2099)

fxmarty committed 1 year ago

Verified 4c50b6d0

feat: use model name as adapter id in chat endpoints (#2128)

drbh committed 1 year ago

Verified 87ebb647

update to metrics 0.23.0 or could work with metrics-exporter-promethe… (#2190)

sywangyi committed 1 year ago

Verified 58effe78

fix: python deserialization (#2178)

jaluma committed 1 year ago

Verified 16d9e505

add doc for intel gpus (#2181)

sywangyi committed 1 year ago

Verified 07e240ca

Falcon/DBRX: get correct number of key-value heads (#2205)

danieldk committed 1 year ago

Verified 5c7c9f13

Fix incorrect cache allocation with multi-query (#2203)

danieldk committed 1 year ago

Verified 153fcf77

hotfix: Fix number of KV heads (#2202)

danieldk committed 1 year ago

Verified cce475a9

fix dbrx & opt model prefix bug (#2201)

icyxp committed 1 year ago

Verified 521d0d99

Consistently take `prefix` in model constructors (#2191)

danieldk committed 1 year ago

Verified 05c094fc

GPTQ CI improvements (#2151)

danieldk committed 1 year ago

Verified 67ef0649

Fix Starcoder2 after refactor (#2189)

danieldk committed 1 year ago

Verified b67d4633

Hotfixing after refactor.

Narsil committed 1 year ago

853d4eb9

Refactor dead code - Removing all `flash_xxx.py` files. (#2166)

Narsil committed 1 year ago

Verified fb2f74e2

Adding "longrope" for Phi-3 (#2172) (#2179)

amihalik committed 1 year ago

Verified c6bcadf8

Preparing patch release. (#2186)

Narsil committed 1 year ago

Verified 245d3de9

Older