Add LLaMA end-to-end benchmarking (#19985)
### Description
This PR adds a benchmarking script to measure end-to-end performance and
saves the results in a CSV file.
### Motivation and Context
With this PR, end-to-end performance can be easily measured for many
large-language models such as LLaMA-2. The performance numbers for
LLaMA-2 are located
[here](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/python/models/llama).