text-generation-inference
feat(server): add flash attention llama
#144
Merged

feat(server): add flash attention llama #144

OlivierDehaene merged 38 commits into main from feat/flash_llama
OlivierDehaene
OlivierDehaene OlivierDehaene force pushed from 793a7d92 to 294dc65c 2 years ago
OlivierDehaene OlivierDehaene force pushed from 294dc65c to dfc464ca 2 years ago
OlivierDehaene OlivierDehaene force pushed from dfc464ca to 68f465a6 2 years ago
njhill
njhill commented on 2023-04-05
njhill
njhill commented on 2023-04-05
OlivierDehaene wip
71402ed4
OlivierDehaene feat(server): add flash attention llama
cd5d0a96
OlivierDehaene patch qkv_rot
45eacb78
OlivierDehaene optional rust validation
47e93409
OlivierDehaene rework validation
1dd2c24b
OlivierDehaene cleanup
161e93a4
OlivierDehaene fix instrumentation
30148b77
OlivierDehaene hack
f9b09d96
OlivierDehaene trigger build
8604d370
OlivierDehaene trigger build
eb033e78
OlivierDehaene allow disabling hf_transfer
cdc33ce6
OlivierDehaene improve decode
c11e7741
OlivierDehaene fix concatenate
783bc64f
OlivierDehaene better decode
b5233f9c
OlivierDehaene use all tokens
70637b41
OlivierDehaene update transformers
01ab5df1
OlivierDehaene OlivierDehaene force pushed from 1056fd1c to 01ab5df1 2 years ago
OlivierDehaene upgrade setuptools
c7dd00ea
OlivierDehaene fix tests
6c96f37b
OlivierDehaene fix test
3c272aef
OlivierDehaene fix llama tokenizer
7816a476
OlivierDehaene fix tp
26fc232a
OlivierDehaene remove profiling
c3779fa8
OlivierDehaene better docker layer caching
11111250
OlivierDehaene fmt
e4ad3066
OlivierDehaene correct commit
d7b92e37
OlivierDehaene update flash attention
273f0ae4
OlivierDehaene add validation + decode of special tokens
146e0e27
OlivierDehaene fix truncation
82464709
OlivierDehaene fix validation error
4267378b
OlivierDehaene use join_all instead
af10275f
OlivierDehaene update prom metrics
18e44a6a
OlivierDehaene fix buckets
23b55861
OlivierDehaene force as_secs
a3bdaca0
OlivierDehaene minimum duration to 0.1 ms
3795c19d
OlivierDehaene fmt
7451196a
OlivierDehaene Merge remote-tracking branch 'origin/main' into feat/flash_llama
a1a6b5cc
OlivierDehaene revert build
c2beaa27
OlivierDehaene OlivierDehaene marked this pull request as ready for review 2 years ago
OlivierDehaene add llama to readme
d7548aef
OlivierDehaene OlivierDehaene merged 299217c9 into main 2 years ago
OlivierDehaene OlivierDehaene deleted the feat/flash_llama branch 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone