[WebGPU EP] Support GroupQueryAttention #22658
satyajandhyala
marked this pull request as ready for review 1 year ago
Added attention_common.h
0a5d2129
wip
5bfa0705
Fix compilation errors
e6615e9e
lint
449afb4d
Modified MultiHeadAttention to not derive from AttentionBase class
8d104726
Uncomment GQA registration
4ea58d1e
Moved TransferBSToBNSH and ApplyAttention declaration to attention_co…
4bcf257a
Revert "Modified MultiHeadAttention to not derive from AttentionBase …
5c5c9344
Converted CheckInput function to template to fix compiler/linker mult…
e7165469
lint
aba59e5a
Fixed conflicts.
067ecd18
copying errors
53f1c78d
Fixed inplacesoftmax dispatch
f4dc9fc6
Initialize required parameter data
3d1af1c6
Map total_seqlen_tensor input to CPU
2eaeebc3
Use uniforms variable name consistently to avoid confusion.
9c828ccb
Keep InplaceSoftmax dispatch 3-dim.
26caa060
Formatting changes.
64b093f6
Use total_seqlen_tensor input only to determin is_first_prompt.
a8bd38bf
initialize is_packed_qkv_
d613df42
Handle past key/value and present key/value buffer sharing.
0fedb9fa
lint
993140b2
Added past_present_share_buffer to the hint. typo
7502493a
past_present_share_buffer related changes.
5f1fdaea
lint
6d2bd68f
Fix integer division
82a005de
Updated hints
fd9409fc
match jsep code
15c96b3d
Fixed a minor issue
72601d1e
lint
65495b6b
Fix a bug using total_sequence_length instead of uniform.total_sequen…
63f20ed3
Revert "match jsep code"
0102206e
Removed is_first_prompt from uniforms.
71ed10c1
Updated hints
9c08c821
Use kv_num_heads instead num_heads for key/value input shape conversion.
eb5d7b4e
lint
7a2d3b6b
Merge branch 'main' of https://github.com/microsoft/onnxruntime into …
664022fb
changed variable name
a48d782c
Removed is_first_prompt from uniforms, used in a condition generating…
4334b396
error
d53d7ef6
initialize scale
5dc95c83
Calculate output chunk size based on whether the kernel is GQA or not.
e448b1aa
Revert "Calculate output chunk size based on whether the kernel is GQ…
60af2f52
Bug fix
47e6f525
guschmue
dismissed these changes
on 2024-11-25
Reapply "Calculate output chunk size based on whether the kernel is G…
b494c732
tmp
0f110883
Simplified logic.
217058de
lint
7f53931c
minor coding issue
64976fdb
lint
2677a0bf
Revert "tmp"
f209a384
Revert "Reapply "Calculate output chunk size based on whether the ker…
1aff4d49
guschmue
approved these changes
on 2024-12-02
guschmue
merged
e8bf46a7
into main 1 year ago
guschmue
deleted the sajandhy/webgpu-ep-gqa-new branch 1 year ago
Assignees
No one assigned