PR #22658 [WebGPU EP] Support GroupQueryAttention

[WebGPU EP] Support GroupQueryAttention #22658

guschmue merged 52 commits into main from sajandhy/webgpu-ep-gqa-new

github-advanced-security commented on 2024-10-30

github-actions dismissed these changes on 2024-10-31

github-advanced-security commented on 2024-10-31

satyajandhyala marked this pull request as ready for review 1 year ago

skottmckay commented on 2024-11-04

satyajandhyala force pushed from 514217fc to d49ecb41 1 year ago

guschmue added ep:WebGPU

github-advanced-security commented on 2024-11-06

github-actions commented on 2024-11-06

Added attention_common.h

0a5d2129

wip

5bfa0705

Fix compilation errors

e6615e9e

lint

449afb4d

Modified MultiHeadAttention to not derive from AttentionBase class

8d104726

Uncomment GQA registration

4ea58d1e

Moved TransferBSToBNSH and ApplyAttention declaration to attention_co…

4bcf257a

Revert "Modified MultiHeadAttention to not derive from AttentionBase …

5c5c9344

Converted CheckInput function to template to fix compiler/linker mult…

e7165469

lint

aba59e5a

Fixed conflicts.

067ecd18

copying errors

53f1c78d

Fixed inplacesoftmax dispatch

f4dc9fc6

Initialize required parameter data

3d1af1c6

Map total_seqlen_tensor input to CPU

2eaeebc3

Use uniforms variable name consistently to avoid confusion.

9c828ccb

Keep InplaceSoftmax dispatch 3-dim.

26caa060

Formatting changes.

64b093f6

Use total_seqlen_tensor input only to determin is_first_prompt.

a8bd38bf

initialize is_packed_qkv_

d613df42

Handle past key/value and present key/value buffer sharing.

0fedb9fa

lint

993140b2

Added past_present_share_buffer to the hint. typo

7502493a

past_present_share_buffer related changes.

5f1fdaea

satyajandhyala force pushed from 4a072b52 to 5f1fdaea 1 year ago

github-actions commented on 2024-11-13

lint

6d2bd68f

Fix integer division

82a005de

Updated hints

fd9409fc

match jsep code

15c96b3d

Fixed a minor issue

72601d1e

github-actions commented on 2024-11-14

lint

65495b6b

Fix a bug using total_sequence_length instead of uniform.total_sequen…

63f20ed3

Revert "match jsep code"

0102206e

Removed is_first_prompt from uniforms.

71ed10c1

Updated hints

9c08c821

Use kv_num_heads instead num_heads for key/value input shape conversion.

eb5d7b4e

github-actions commented on 2024-11-18

lint

7a2d3b6b

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

664022fb

changed variable name

a48d782c

Removed is_first_prompt from uniforms, used in a condition generating…

4334b396

error

d53d7ef6

initialize scale

5dc95c83

Calculate output chunk size based on whether the kernel is GQA or not.

e448b1aa

Revert "Calculate output chunk size based on whether the kernel is GQ…

60af2f52

Bug fix

47e6f525

guschmue dismissed these changes on 2024-11-25

Reapply "Calculate output chunk size based on whether the kernel is G…

b494c732

tmp

0f110883

Simplified logic.

217058de

satyajandhyala dismissed their stale review via 217058de 1 year ago

lint

7f53931c

minor coding issue

64976fdb

github-actions commented on 2024-11-26

github-advanced-security commented on 2024-11-26

lint

2677a0bf

satyajandhyala requested a review from

guschmue 1 year ago

satyajandhyala dismissed their stale review 1 year ago

Fixed lint errors and coding guidelines

Revert "tmp"

f209a384

Revert "Reapply "Calculate output chunk size based on whether the ker…

1aff4d49

guschmue approved these changes on 2024-12-02

guschmue merged e8bf46a7 into main 1 year ago

guschmue deleted the sajandhy/webgpu-ep-gqa-new branch 1 year ago

Reviewers

guschmue

github-actions

skottmckay

github-advanced-security

Assignees

No one assigned

Labels

ep:WebGPU

Milestone

No milestone

onnxruntime [WebGPU EP] Support GroupQueryAttention #22658 Merged

[WebGPU EP] Support GroupQueryAttention #22658

onnxruntime
[WebGPU EP] Support GroupQueryAttention
#22658

Merged