onnxruntime
4e27841b - fix gqa cpu nan bug (#20521)

Commit
1 year ago
fix gqa cpu nan bug (#20521) ### Description There was a bug with gqa on cpu where on token case, with batch_size > 1, and with past_present_share_buffer off, the output would occasionally contain nans. this pr fixes that. it also updates documentation and fixes posid gen for rotary in cuda in prompt case. ### Motivation and Context this pr solves the GQA CPU bug as well as updates the documentation and makes seqlens_k irrelevant for prompt case, which is useful to prevent user error.
Author
Parents
Loading