feat(server): add flash attention llama #144
njhill
commented
on 2023-04-05
njhill
commented
on 2023-04-05
wip
71402ed4
feat(server): add flash attention llama
cd5d0a96
patch qkv_rot
45eacb78
optional rust validation
47e93409
rework validation
1dd2c24b
cleanup
161e93a4
fix instrumentation
30148b77
hack
f9b09d96
trigger build
8604d370
trigger build
eb033e78
allow disabling hf_transfer
cdc33ce6
improve decode
c11e7741
fix concatenate
783bc64f
better decode
b5233f9c
use all tokens
70637b41
update transformers
01ab5df1
upgrade setuptools
c7dd00ea
fix tests
6c96f37b
fix test
3c272aef
fix llama tokenizer
7816a476
fix tp
26fc232a
remove profiling
c3779fa8
better docker layer caching
11111250
fmt
e4ad3066
correct commit
d7b92e37
update flash attention
273f0ae4
add validation + decode of special tokens
146e0e27
fix truncation
82464709
fix validation error
4267378b
use join_all instead
af10275f
update prom metrics
18e44a6a
fix buckets
23b55861
force as_secs
a3bdaca0
minimum duration to 0.1 ms
3795c19d
fmt
7451196a
Merge remote-tracking branch 'origin/main' into feat/flash_llama
a1a6b5cc
revert build
c2beaa27
OlivierDehaene
marked this pull request as ready for review 2 years ago
add llama to readme
d7548aef
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub