Lots of improvements (Still 2 allocators) #2449
Narsil
changed the title Only 1 allocator2 Lots of improvements (Still 2 allocators) 1 year ago
Narsil
force pushed
from
8be5e328
to
c1572bfe
1 year ago
Making prefix/flashinfer the default and testing the full release tests.
60719bab
Include flashinfer in the docker.
9d4c5d39
Using prebuilt.
f2bdc650
Allowing window_left_size (dummy version).
f55278de
Disabling flashinfer/prefix caching on odd head_dim
cba59aca
Disable prefix caching for lora.
a6cd5fef
More specific codes.
f0b35f94
Update lock
ffb68411
Updating integration tests with new values with FI/FD.
ba1ce20c
Update cargo lock ?
17c8a5e5
Upgrade to 1.80 because of bitstream...
344fee0d
Everywhere 1.80
860b550c
Forgot last default place.
8d0220a6
Apply suggestions from code review
b80593bf
Updated flake lock
0bf4eb96
Tmp
5eb6ea00
Upgrade resolution system for less errors in resolution.
32f64163
Remove lambda for cleaner function.
c53968dc
Handling debugger.
682db34b
OVerride the env in server tests.
1568e825
Is this enough to make it work ?
f5182c18
This seems to be working.
26e5037d
Downgrade some logs.
27b566ba
Fixing the default for vlm.
e30fb254
Don't enable prefix caching on VLM just yet.
f1c07354
Change `add_special_tokens` in order to have the correct tokens for chat
7f1816a4
Fixing prefix caching for flashdecoding.
65b94a69
Update all models.
bb9769ed
Fixed flashinfer version.
55d984d7
add_special_tokens is internal only
9dacac3b
Fixing seqlen with the new vlms.
e0069a3a
Fixing the issue with `add_special_tokens` not being passed around.
2cf1f5c0
Fixing the test.
ccaf1d00
Narsil
force pushed
from
c70335f2
to
ccaf1d00
1 year ago
Removing encoder_decoder (seq2seq).
8ac1ffa0
Update the chat test.
c6f1a612
Fixing the batching tokenization in flash causal lm.
0a609731
Truncating left for radix purposes.
e6ee67f3
Oops this doesn't belong here.
f8867479
Put back default pure shell.
12325564
Update server tests
8d018483
Only n_heads / process_group.size() are necessary.
8a4df6e1
Revert the integrationt tests change (seem linked to head_size
e7e03638
Adding error message when assert is violated.
9c839ca5
Fixing the free algorithm to handle times where the common prefix is
bef2f6bd
Apply suggestions from code review
4b375004
Update server/text_generation_server/layers/attention/common.py
d77f5f2e
Fix disabling prefix caching - Fix windowing checks.
9bfdac23
Revert the Cohere tokenizer change (for now using a revision instead).
0c00b949
Fmt.
b4126793
Narsil
merged
e415b690
into main 1 year ago
Narsil
deleted the only_1_allocator2 branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub