PR #144 feat(server): add flash attention llama

feat(server): add flash attention llama #144

OlivierDehaene merged 38 commits into main from feat/flash_llama

OlivierDehaene force pushed from 793a7d92 to 294dc65c 2 years ago

OlivierDehaene force pushed from 294dc65c to dfc464ca 2 years ago

OlivierDehaene force pushed from dfc464ca to 68f465a6 2 years ago

njhill commented on 2023-04-05

wip

71402ed4

feat(server): add flash attention llama

cd5d0a96

patch qkv_rot

45eacb78

optional rust validation

47e93409

rework validation

1dd2c24b

cleanup

161e93a4

fix instrumentation

30148b77

hack

f9b09d96

trigger build

8604d370

trigger build

eb033e78

allow disabling hf_transfer

cdc33ce6

improve decode

c11e7741

fix concatenate

783bc64f

better decode

b5233f9c

use all tokens

70637b41

update transformers

01ab5df1

OlivierDehaene force pushed from 1056fd1c to 01ab5df1 2 years ago

upgrade setuptools

c7dd00ea

fix tests

6c96f37b

fix test

3c272aef

fix llama tokenizer

7816a476

fix tp

26fc232a

remove profiling

c3779fa8

better docker layer caching

11111250

fmt

e4ad3066

correct commit

d7b92e37

update flash attention

273f0ae4

add validation + decode of special tokens

146e0e27

fix truncation

82464709

fix validation error

4267378b

use join_all instead

af10275f

update prom metrics

18e44a6a

fix buckets

23b55861

force as_secs

a3bdaca0

minimum duration to 0.1 ms

3795c19d

fmt

7451196a

Merge remote-tracking branch 'origin/main' into feat/flash_llama

a1a6b5cc

revert build

c2beaa27

OlivierDehaene marked this pull request as ready for review 2 years ago

add llama to readme

d7548aef

OlivierDehaene merged 299217c9 into main 2 years ago

OlivierDehaene deleted the feat/flash_llama branch 2 years ago

Reviewers

njhill

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

text-generation-inference feat(server): add flash attention llama #144 Merged

feat(server): add flash attention llama #144

text-generation-inference
feat(server): add flash attention llama
#144

Merged