Add new Arabic benchmarks (5) and enhance existing tasks (#372)
* Update arabic_evals.py
Add new Arabic benchmarks and update existing tasks
- Renamed `arabic_mmlu` to `arabic_mmlu_mt` to highlight its machine-translated origin.
- Added new benchmarks: `arabic_mmlu` ArabicMMLU (https://arxiv.org/abs/2402.12840), `arabic_mmlu_ht` (human-translated), and `MadinahQA` from MBZUAI. As well as `arabic_mmmlu` (OpenAI MMMLU), and `AraTrust` a trustworthiness benchmark for Arabic LLMs (https://arxiv.org/abs/2403.09017).
- Enhanced prompt functions for better flexibility in answer options.
* Update and rename OALL_tasks.txt to OALL_v1_tasks.txt
Rename file to refelect that it is v1 leaderboard tasks
* Create OALL_v2_tasks.txt
Tasks for v2 of OALL
* Update all_arabic_tasks.txt
add new and renamed tasks
* Update arabic_evals.py
Fix formatting issues for
* Update all_arabic_tasks.txt
Add missing task: OpenAI's MMMLU arabic subset
* Update all_arabic_tasks.txt
Correct order
* Update arabic_evals.py
remove openai mmmlu task following the discussion here: https://github.com/huggingface/lighteval/pull/372
* Update all_arabic_tasks.txt
remove openai mmmlu task following the discussion here: https://github.com/huggingface/lighteval/pull/372
* Update tasks.py
Adding a templated version of arabic mmlu based on @hynky1999 request in the #372 PR
* Update tasks.py
remove arabic_mmlu_templated_tasks
---------
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>