Remove CUDA from `benchmarks` directory. (#9610)
This PR removes the CUDA specific code from the `benchmarks` directory.
This is in line with the CUDA deprecation that started on release 2.8.
**Key Changes:**
- Removed the `keep_model_data_on_cuda` parameter
- Used in combination with zero-overhead CUDA to XLA:CUDA data movement,
removed in [#9598][1] and [#9603][2]
- Deleted `llama.py`, `nightly.sh`, `run_benchmark.sh`,
`run_single_graph_bm.sh`, and `run_top_tier_bm.sh`
- All of them ran benchmarks comparing PyTorch Inductor with XLA:CUDA,
specifically
[1]: https://github.com/pytorch/xla/pull/9598
[2]: https://github.com/pytorch/xla/pull/9603