onnxruntime
1f509215 - Fix GroupQueryAttention benchmark script (#20291)

Commit

1 year ago

Fix GroupQueryAttention benchmark script (#20291) ### Description Fix a few issues in GQA: (1) memory efficient attention does not have bfloat16, need disable it when bfloat16 is used. (2) When prompt length is 1, it is not classified as prompt. (3) Fix benchmark_gqa.py (4) Add a comment about seqlen_k to avoid confusion. ### Motivation and Context https://github.com/microsoft/onnxruntime/pull/20279

References

#20291 - Fix GroupQueryAttention benchmark script

Author

tianleiwu

Parents

b6d9abf1

onnxruntime 1f509215 - Fix GroupQueryAttention benchmark script (#20291)

onnxruntime
1f509215 - Fix GroupQueryAttention benchmark script (#20291)