onnxruntime
LLaMA Model Optimization
#18021
Merged

LLaMA Model Optimization #18021

kunal-vaishnavi
kunal-vaishnavi Initial fusions and kernel changes for LLaMA
e74b899e
kunal-vaishnavi Add rotary embeddings for LLaMA
228de8ca
kunal-vaishnavi Change input shapes and types for fused model
dc16e164
kunal-vaishnavi Add present kv to multi-head attention
816f7e94
kunal-vaishnavi Merge branch 'main' into kvaishnavi/llama
5ce8e5a6
kunal-vaishnavi Update benchmark scripts
6669899b
kunal-vaishnavi Update inputs for optimized model
ed61ae48
kunal-vaishnavi Merge branch 'main' into kvaishnavi/llama
cdbd4664
kunal-vaishnavi Add interleaved and non-interleaved rotary embeddings
becbd302
kunal-vaishnavi Update rotary embeddings and export scripts
eece5e82
kunal-vaishnavi Fix attention mask for HF version
55d05547
kunal-vaishnavi Modify rotary embeddings fusion for merged HF model
37e6b5fd
kunal-vaishnavi Add optimization passes after conversion
909f8e76
kunal-vaishnavi Fix adding GQA to optimized model
43f459bb
kunal-vaishnavi Add CPU implementation for rotary embeddings
4e2bf415
kunal-vaishnavi Add test cases
2210c476
kunal-vaishnavi Clean up test cases
6f154e30
kunal-vaishnavi Fix initializer data in test case
822c2e60
kunal-vaishnavi Add merged export
cdf55360
kunal-vaishnavi Remove logger warning
52f59949
kunal-vaishnavi Update docs
0d176567
kunal-vaishnavi Enable buffer sharing and int4 quantization
bcb5a32d
kunal-vaishnavi Fix inputs for buffer sharing
8ae9188c
kunal-vaishnavi Remove extra print
143d8057
kunal-vaishnavi Clean up code
f2b46448
kunal-vaishnavi Merge branch 'main' into kvaishnavi/llama
d7bb72c9
github-advanced-security
github-advanced-security commented on 2023-10-18
github-advanced-security
github-advanced-security commented on 2023-10-18
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
kunal-vaishnavi Address PR feedback
8968bb3d
kunal-vaishnavi Add changes suggested by linters
84f7cc09
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
tianleiwu
tianleiwu commented on 2023-10-19
kunal-vaishnavi Fix min CUDA architecture
99ec3410
kunal-vaishnavi Add graph input for GQA
b76e2c2b
justinchuby
justinchuby commented on 2023-10-19
justinchuby
justinchuby commented on 2023-10-19
kunal-vaishnavi Fix GQA parity issue
edafef50
kunal-vaishnavi Add changes suggested by linter
7b829122
kunal-vaishnavi Remove unreferenced parameter
a8913986
github-advanced-security
github-advanced-security commented on 2023-10-20
kunal-vaishnavi Change rotary embedding test threshold
716b7253
tianleiwu
kunal-vaishnavi Add int4 CPU support
6b8698d4
kunal-vaishnavi Add changes suggested by linters
cc0199b2
github-advanced-security
github-advanced-security commented on 2023-10-20
kunal-vaishnavi Merge branch 'main' into kvaishnavi/llama
e38ecb3b
kunal-vaishnavi Fix linter issue
e69c23b5
kunal-vaishnavi Fix CodeQL error
d14d5bdb
trajepl
trajepl commented on 2023-10-23
tianleiwu
tianleiwu
tianleiwu approved these changes on 2023-10-23
kunal-vaishnavi
kunal-vaishnavi kunal-vaishnavi merged 2a17d5cf into main 2 years ago
mindest
mindest commented on 2023-10-24
yufenglee yufenglee added release:1.16.2
faxu faxu added triage:approved
faxu faxu added sdxl_llama
tianleiwu tianleiwu removed triage:approved
tianleiwu tianleiwu removed release:1.16.2
tianleiwu tianleiwu removed sdxl_llama

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone