PR #2449 Lots of improvements (Still 2 allocators)

Lots of improvements (Still 2 allocators) #2449

Narsil merged 49 commits into main from only_1_allocator2

Narsil changed the title ~~Only 1 allocator2~~ Lots of improvements (Still 2 allocators) 1 year ago

Narsil force pushed from 8be5e328 to c1572bfe 1 year ago

Making prefix/flashinfer the default and testing the full release tests.

60719bab

Include flashinfer in the docker.

9d4c5d39

Using prebuilt.

f2bdc650

Allowing window_left_size (dummy version).

f55278de

Disabling flashinfer/prefix caching on odd head_dim

cba59aca

Disable prefix caching for lora.

a6cd5fef

More specific codes.

f0b35f94

Update lock

ffb68411

Updating integration tests with new values with FI/FD.

ba1ce20c

Update cargo lock ?

17c8a5e5

Upgrade to 1.80 because of bitstream...

344fee0d

Everywhere 1.80

860b550c

Forgot last default place.

8d0220a6

Apply suggestions from code review

b80593bf

Updated flake lock

0bf4eb96

Tmp

5eb6ea00

Upgrade resolution system for less errors in resolution.

32f64163

Remove lambda for cleaner function.

c53968dc

Handling debugger.

682db34b

OVerride the env in server tests.

1568e825

Is this enough to make it work ?

f5182c18

This seems to be working.

26e5037d

Downgrade some logs.

27b566ba

Fixing the default for vlm.

e30fb254

Don't enable prefix caching on VLM just yet.

f1c07354

Change `add_special_tokens` in order to have the correct tokens for chat

7f1816a4

Fixing prefix caching for flashdecoding.

65b94a69

Update all models.

bb9769ed

Fixed flashinfer version.

55d984d7

add_special_tokens is internal only

9dacac3b

Fixing seqlen with the new vlms.

e0069a3a

Fixing the issue with `add_special_tokens` not being passed around.

2cf1f5c0

Fixing the test.

ccaf1d00

Narsil force pushed from c70335f2 to ccaf1d00 1 year ago

Removing encoder_decoder (seq2seq).

8ac1ffa0

Update the chat test.

c6f1a612

Fixing the batching tokenization in flash causal lm.

0a609731

Truncating left for radix purposes.

e6ee67f3

Oops this doesn't belong here.

f8867479

Put back default pure shell.

12325564

Update server tests

8d018483

Only n_heads / process_group.size() are necessary.

8a4df6e1

Revert the integrationt tests change (seem linked to head_size

e7e03638

Adding error message when assert is violated.

9c839ca5

Fixing the free algorithm to handle times where the common prefix is

bef2f6bd

OlivierDehaene commented on 2024-08-29

Apply suggestions from code review

4b375004

Update server/text_generation_server/layers/attention/common.py

d77f5f2e

Fix disabling prefix caching - Fix windowing checks.

9bfdac23

Revert the Cohere tokenizer change (for now using a revision instead).

0c00b949

Fmt.

b4126793

OlivierDehaene approved these changes on 2024-08-29

Narsil merged e415b690 into main 1 year ago

Narsil deleted the only_1_allocator2 branch 1 year ago

Reviewers

OlivierDehaene

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

text-generation-inference Lots of improvements (Still 2 allocators) #2449 Merged

Lots of improvements (Still 2 allocators) #2449

text-generation-inference
Lots of improvements (Still 2 allocators)
#2449

Merged