onnxruntime
c8def0cc - Add LLaMA GQA ragged batching (#18337)

Commit
2 years ago
Add LLaMA GQA ragged batching (#18337) This PR updates replacing MHA with GQA and updates the LLaMA scripts for the modified GQA op. It is related to the changes in [this PR](https://github.com/microsoft/onnxruntime/pull/18283). ### Motivation and Context This PR allows us to run LLaMA with the GQA op end-to-end using ragged batching (i.e. batched inputs of different lengths).
Parents
Loading