onnxruntime
7e2c7224 - Add Continuous Decoding support in GQA (#21523)

Commit
1 year ago
Add Continuous Decoding support in GQA (#21523) ### Description This PR will add support for Continuous Decoding for batch_size = 1 input. From now on, GQA can take arbitrary length input using seqlens_k as total_sequence_length - 1 and the sequence length of qkv as new_sequence_length. **This change will not affect the default behavior of GQA** ### Motivation and Context Prior to this change it was impossible to support sequence_length > 1 inputs when past context was given. This use case is essential to making continuous decoding work, which is one of our current efforts in ORT-GenAI.
Author
Parents
Loading