onnxruntime
6e4516ce - Fix parity checker in LLaMA scripts (#20301)

Commit
1 year ago
Fix parity checker in LLaMA scripts (#20301) ### Description This PR fixes the parity checker in the LLaMA scripts by adding the following. - Enable buffer sharing manually with `use_buffer_share` instead of `use_gqa` - Get max sequence length from model's config ### Motivation and Context This PR fixes an issue with running the parity checker on other large-language models where `GroupQueryAttention` can be used without buffer sharing enabled.
Parents
Loading