onnxruntime
bd61ae53 - relax seq len checking in rotary_emb (#20778)

Commit
1 year ago
relax seq len checking in rotary_emb (#20778) ### Description Length checking is even more strict for packed batching input. There are two cases for a batch of input_ids. - padded seq with equal length of inputs. ``` |----********| |------------| |--------****| |-***********| ``` - packed seqs with different length of input_ids `|----|---------|----|-|` The max_seq_length is either from graph_inputs or the position_ids. While in most of cases, we will cache the max_seq_length of rotary_cache in the model ans shared among all layers. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: kailums <kalu@microsoft.com>
Author
Parents
Loading