onnxruntime
[WebGPU EP] Support GroupQueryAttention
#22658
Merged

[WebGPU EP] Support GroupQueryAttention #22658

guschmue merged 52 commits into main from sajandhy/webgpu-ep-gqa-new
satyajandhyala
github-advanced-security
github-advanced-security commented on 2024-10-30
github-actions
github-actions dismissed these changes on 2024-10-31
github-advanced-security
github-advanced-security commented on 2024-10-31
satyajandhyala satyajandhyala marked this pull request as ready for review 1 year ago
skottmckay
skottmckay commented on 2024-11-04
skottmckay
skottmckay commented on 2024-11-04
satyajandhyala satyajandhyala force pushed from 514217fc to d49ecb41 1 year ago
guschmue guschmue added ep:WebGPU
github-advanced-security
github-advanced-security commented on 2024-11-06
github-actions
github-actions commented on 2024-11-06
github-actions
github-actions commented on 2024-11-06
satyajandhyala Added attention_common.h
0a5d2129
satyajandhyala wip
5bfa0705
satyajandhyala Fix compilation errors
e6615e9e
satyajandhyala lint
449afb4d
satyajandhyala Modified MultiHeadAttention to not derive from AttentionBase class
8d104726
satyajandhyala Uncomment GQA registration
4ea58d1e
satyajandhyala Moved TransferBSToBNSH and ApplyAttention declaration to attention_co…
4bcf257a
satyajandhyala Revert "Modified MultiHeadAttention to not derive from AttentionBase …
5c5c9344
satyajandhyala Converted CheckInput function to template to fix compiler/linker mult…
e7165469
satyajandhyala lint
aba59e5a
satyajandhyala Fixed conflicts.
067ecd18
satyajandhyala copying errors
53f1c78d
satyajandhyala Fixed inplacesoftmax dispatch
f4dc9fc6
satyajandhyala Initialize required parameter data
3d1af1c6
satyajandhyala Map total_seqlen_tensor input to CPU
2eaeebc3
satyajandhyala Use uniforms variable name consistently to avoid confusion.
9c828ccb
satyajandhyala Keep InplaceSoftmax dispatch 3-dim.
26caa060
satyajandhyala Formatting changes.
64b093f6
satyajandhyala Use total_seqlen_tensor input only to determin is_first_prompt.
a8bd38bf
satyajandhyala initialize is_packed_qkv_
d613df42
satyajandhyala Handle past key/value and present key/value buffer sharing.
0fedb9fa
satyajandhyala lint
993140b2
satyajandhyala Added past_present_share_buffer to the hint. typo
7502493a
satyajandhyala past_present_share_buffer related changes.
5f1fdaea
satyajandhyala satyajandhyala force pushed from 4a072b52 to 5f1fdaea 1 year ago
github-actions
github-actions commented on 2024-11-13
satyajandhyala lint
6d2bd68f
satyajandhyala Fix integer division
82a005de
satyajandhyala Updated hints
fd9409fc
satyajandhyala match jsep code
15c96b3d
satyajandhyala Fixed a minor issue
72601d1e
github-actions
github-actions commented on 2024-11-14
satyajandhyala lint
65495b6b
satyajandhyala Fix a bug using total_sequence_length instead of uniform.total_sequen…
63f20ed3
satyajandhyala Revert "match jsep code"
0102206e
satyajandhyala Removed is_first_prompt from uniforms.
71ed10c1
satyajandhyala Updated hints
9c08c821
satyajandhyala Use kv_num_heads instead num_heads for key/value input shape conversion.
eb5d7b4e
github-actions
github-actions commented on 2024-11-18
satyajandhyala lint
7a2d3b6b
satyajandhyala Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
664022fb
satyajandhyala changed variable name
a48d782c
satyajandhyala Removed is_first_prompt from uniforms, used in a condition generating…
4334b396
satyajandhyala error
d53d7ef6
satyajandhyala initialize scale
5dc95c83
satyajandhyala Calculate output chunk size based on whether the kernel is GQA or not.
e448b1aa
satyajandhyala Revert "Calculate output chunk size based on whether the kernel is GQ…
60af2f52
satyajandhyala Bug fix
47e6f525
guschmue
guschmue dismissed these changes on 2024-11-25
satyajandhyala Reapply "Calculate output chunk size based on whether the kernel is G…
b494c732
satyajandhyala tmp
0f110883
satyajandhyala Simplified logic.
217058de
satyajandhyala satyajandhyala dismissed their stale review via 217058de 1 year ago
satyajandhyala lint
7f53931c
satyajandhyala minor coding issue
64976fdb
github-actions
github-actions commented on 2024-11-26
github-advanced-security
github-advanced-security commented on 2024-11-26
satyajandhyala lint
2677a0bf
satyajandhyala satyajandhyala requested a review from guschmue guschmue 1 year ago
satyajandhyala satyajandhyala dismissed their stale review 1 year ago
Fixed lint errors and coding guidelines
satyajandhyala Revert "tmp"
f209a384
satyajandhyala Revert "Reapply "Calculate output chunk size based on whether the ker…
1aff4d49
guschmue
guschmue approved these changes on 2024-12-02
guschmue guschmue merged e8bf46a7 into main 1 year ago
guschmue guschmue deleted the sajandhy/webgpu-ep-gqa-new branch 1 year ago
guschmue

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone