V inputs (#13410)

Commit

3 years ago

Update Attention operator to support separated Q/K/V inputs (#13410) ### Description Allow separated Q, K and V inputs to support cross attention: * Q: [batch_size, sequence_length, hidden_size] * K: [batch_size, kv_sequence_length, hidden_size] * V: [batch_size, kv_sequence_length, v_hidden_size] * Output: [batch_size, sequence_length, v_hidden_size] To use separated Q/K/V inputs, the input tensor is for query, and two optional inputs are added for key and value. Weights for input projection is not included for now, so the MatMul of input projection shall be done out of Attention operator, but Add bias is included for performance consideration.

References

#13410 - Update Attention operator to support separated Q/K/V inputs

Author

tianleiwu

Parents

a396a91c

onnxruntime 7aafd862 - Update Attention operator to support separated Q/K/V inputs (#13410)

onnxruntime
7aafd862 - Update Attention operator to support separated Q/K/V inputs (#13410)