Add LLaMA GQA ragged batching (#18337)
This PR updates replacing MHA with GQA and updates the LLaMA scripts for
the modified GQA op. It is related to the changes in [this
PR](https://github.com/microsoft/onnxruntime/pull/18283).
### Motivation and Context
This PR allows us to run LLaMA with the GQA op end-to-end using ragged
batching (i.e. batched inputs of different lengths).