onnxruntime
6e4516ce - Fix parity checker in LLaMA scripts (#20301)

Commit

1 year ago

Fix parity checker in LLaMA scripts (#20301) ### Description This PR fixes the parity checker in the LLaMA scripts by adding the following. - Enable buffer sharing manually with `use_buffer_share` instead of `use_gqa` - Get max sequence length from model's config ### Motivation and Context This PR fixes an issue with running the parity checker on other large-language models where `GroupQueryAttention` can be used without buffer sharing enabled.

References

#20301 - Fix parity checker in LLaMA scripts

Author

kunal-vaishnavi

Parents

bf72f996

onnxruntime 6e4516ce - Fix parity checker in LLaMA scripts (#20301)

onnxruntime
6e4516ce - Fix parity checker in LLaMA scripts (#20301)