text-generation-inference
Lots of improvements (Still 2 allocators)
#2449
Merged

Lots of improvements (Still 2 allocators) #2449

Narsil merged 49 commits into main from only_1_allocator2
Narsil
Narsil Narsil changed the title Only 1 allocator2 Lots of improvements (Still 2 allocators) 1 year ago
Narsil Narsil force pushed from 8be5e328 to c1572bfe 1 year ago
Narsil Making prefix/flashinfer the default and testing the full release tests.
60719bab
Narsil Include flashinfer in the docker.
9d4c5d39
Narsil Using prebuilt.
f2bdc650
Narsil Allowing window_left_size (dummy version).
f55278de
Narsil Disabling flashinfer/prefix caching on odd head_dim
cba59aca
Narsil Disable prefix caching for lora.
a6cd5fef
Narsil More specific codes.
f0b35f94
Narsil Update lock
ffb68411
Narsil Updating integration tests with new values with FI/FD.
ba1ce20c
Narsil Update cargo lock ?
17c8a5e5
Narsil Upgrade to 1.80 because of bitstream...
344fee0d
Narsil Everywhere 1.80
860b550c
Narsil Forgot last default place.
8d0220a6
Narsil Apply suggestions from code review
b80593bf
Narsil Updated flake lock
0bf4eb96
Narsil Tmp
5eb6ea00
Narsil Upgrade resolution system for less errors in resolution.
32f64163
Narsil Remove lambda for cleaner function.
c53968dc
Narsil Handling debugger.
682db34b
Narsil OVerride the env in server tests.
1568e825
Narsil Is this enough to make it work ?
f5182c18
Narsil This seems to be working.
26e5037d
Narsil Downgrade some logs.
27b566ba
Narsil Fixing the default for vlm.
e30fb254
Narsil Don't enable prefix caching on VLM just yet.
f1c07354
Narsil Change `add_special_tokens` in order to have the correct tokens for chat
7f1816a4
Narsil Fixing prefix caching for flashdecoding.
65b94a69
Narsil Update all models.
bb9769ed
Narsil Fixed flashinfer version.
55d984d7
Narsil add_special_tokens is internal only
9dacac3b
Narsil Fixing seqlen with the new vlms.
e0069a3a
Narsil Fixing the issue with `add_special_tokens` not being passed around.
2cf1f5c0
Narsil Fixing the test.
ccaf1d00
Narsil Narsil force pushed from c70335f2 to ccaf1d00 1 year ago
Narsil Removing encoder_decoder (seq2seq).
8ac1ffa0
Narsil Update the chat test.
c6f1a612
Narsil Fixing the batching tokenization in flash causal lm.
0a609731
Narsil Truncating left for radix purposes.
e6ee67f3
Narsil Oops this doesn't belong here.
f8867479
Narsil Put back default pure shell.
12325564
Narsil Update server tests
8d018483
Narsil Only n_heads / process_group.size() are necessary.
8a4df6e1
Narsil Revert the integrationt tests change (seem linked to head_size
e7e03638
Narsil Adding error message when assert is violated.
9c839ca5
Narsil Fixing the free algorithm to handle times where the common prefix is
bef2f6bd
OlivierDehaene
OlivierDehaene commented on 2024-08-29
Narsil Apply suggestions from code review
4b375004
Narsil Update server/text_generation_server/layers/attention/common.py
d77f5f2e
Narsil Fix disabling prefix caching - Fix windowing checks.
9bfdac23
Narsil Revert the Cohere tokenizer change (for now using a revision instead).
0c00b949
Narsil Fmt.
b4126793
OlivierDehaene
OlivierDehaene approved these changes on 2024-08-29
Narsil Narsil merged e415b690 into main 1 year ago
Narsil Narsil deleted the only_1_allocator2 branch 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone