PR #18021 LLaMA Model Optimization

LLaMA Model Optimization #18021

kunal-vaishnavi merged 39 commits into microsoft:main from kunal-vaishnavi:kvaishnavi/llama

Initial fusions and kernel changes for LLaMA

e74b899e

Add rotary embeddings for LLaMA

228de8ca

Change input shapes and types for fused model

dc16e164

Add present kv to multi-head attention

816f7e94

Merge branch 'main' into kvaishnavi/llama

5ce8e5a6

Update benchmark scripts

6669899b

Update inputs for optimized model

ed61ae48

Merge branch 'main' into kvaishnavi/llama

cdbd4664

Add interleaved and non-interleaved rotary embeddings

becbd302

Update rotary embeddings and export scripts

eece5e82

Fix attention mask for HF version

55d05547

Modify rotary embeddings fusion for merged HF model

37e6b5fd

Add optimization passes after conversion

909f8e76

Fix adding GQA to optimized model

43f459bb

Add CPU implementation for rotary embeddings

4e2bf415

Add test cases

2210c476

Clean up test cases

6f154e30

Fix initializer data in test case

822c2e60

Add merged export

cdf55360

Remove logger warning

52f59949

Update docs

0d176567

Enable buffer sharing and int4 quantization

bcb5a32d

Fix inputs for buffer sharing

8ae9188c

Remove extra print

143d8057

Clean up code

f2b46448

Merge branch 'main' into kvaishnavi/llama

d7bb72c9

github-advanced-security commented on 2023-10-18

tianleiwu commented on 2023-10-19

Address PR feedback

8968bb3d

Add changes suggested by linters

84f7cc09

tianleiwu commented on 2023-10-19

Fix min CUDA architecture

99ec3410

Add graph input for GQA

b76e2c2b

justinchuby commented on 2023-10-19

Fix GQA parity issue

edafef50

Add changes suggested by linter

7b829122

Remove unreferenced parameter

a8913986

github-advanced-security commented on 2023-10-20

Change rotary embedding test threshold

716b7253

Add int4 CPU support

6b8698d4

Add changes suggested by linters

cc0199b2

github-advanced-security commented on 2023-10-20

Merge branch 'main' into kvaishnavi/llama

e38ecb3b

Fix linter issue

e69c23b5

Fix CodeQL error

d14d5bdb

trajepl commented on 2023-10-23

tianleiwu approved these changes on 2023-10-23

kunal-vaishnavi merged 2a17d5cf into main 2 years ago

mindest commented on 2023-10-24

yufenglee added release:1.16.2

faxu added triage:approved

faxu added sdxl_llama

tianleiwu removed triage:approved

tianleiwu removed release:1.16.2

tianleiwu removed sdxl_llama

Reviewers

tianleiwu

mindest

justinchuby

trajepl

github-advanced-security

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

onnxruntime LLaMA Model Optimization #18021 Merged

LLaMA Model Optimization #18021

onnxruntime
LLaMA Model Optimization
#18021

Merged