onnxruntime
769d379c - Refactor MultiHeadAttention cpu op (#21055)

Commit

1 year ago

Refactor MultiHeadAttention cpu op (#21055) Refactoring of MultiHeadAttention op - [x] Add some checking for cross attention of pass_past_in_kv to make sure there is no kv cache and bias. - [x] Update interface of PackVIntoRotaryQKV so that it can be used by SparseAttention later. - [x] Add test cases ### Motivation and Context To prepare the pull request for SparseAttention cpu op.

References

#21055 - Refactor MultiHeadAttention cpu op

Author

tianleiwu

Parents

c3076721

onnxruntime 769d379c - Refactor MultiHeadAttention cpu op (#21055)

onnxruntime
769d379c - Refactor MultiHeadAttention cpu op (#21055)