onnxruntime
769d379c - Refactor MultiHeadAttention cpu op (#21055)

Commit
1 year ago
Refactor MultiHeadAttention cpu op (#21055) Refactoring of MultiHeadAttention op - [x] Add some checking for cross attention of pass_past_in_kv to make sure there is no kv cache and bias. - [x] Update interface of PackVIntoRotaryQKV so that it can be used by SparseAttention later. - [x] Add test cases ### Motivation and Context To prepare the pull request for SparseAttention cpu op.
Author
Parents
Loading