onnxruntime
1f509215 - Fix GroupQueryAttention benchmark script (#20291)

Commit
1 year ago
Fix GroupQueryAttention benchmark script (#20291) ### Description Fix a few issues in GQA: (1) memory efficient attention does not have bfloat16, need disable it when bfloat16 is used. (2) When prompt length is 1, it is not classified as prompt. (3) Fix benchmark_gqa.py (4) Add a comment about seqlen_k to avoid confusion. ### Motivation and Context https://github.com/microsoft/onnxruntime/pull/20279
Author
Parents
Loading