DecoderMaskedMultiHeadAttention CPU kernel. #22292
DecoderMaskedMultiHeadAttention CPU kernel.
10cd0279
Fix attention for no-beam case
f4dd1dc9
Fix errors; update unit test cases
33fd0f10
Fix some CI errors.
0af5deb1
Fix: pick up local unstaged changes.
5e745902
Fix error; add broadcast for attn_bias; resolve comments
1c90af9f
Format
ae766526
mindest
marked this pull request as ready for review 1 year ago
Update doc.
caddf065
Add updated op kernel doc.
b2e35b46
mindest
changed the title [WIP] DecoderMaskedMultiHeadAttention CPU kernel. DecoderMaskedMultiHeadAttention CPU kernel. 1 year ago
Resolve comments.
4f96ddb4
Resolve more comments; fix warning
3ee8d538
Fix CI, warnings.
f9f4ff36
tianleiwu
dismissed these changes
on 2024-10-12
Fix warning; rename to output_qk
5b5e7913
mindest
dismissed their stale review
via 5b5e7913
1 year ago
typo
67a653e3
tianleiwu
approved these changes
on 2024-10-12
tianleiwu
merged
1fa219d7
into main 1 year ago
tianleiwu
deleted the linmin/cpu_dmmha branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub