lighteval
de8dba39 - Add new Arabic benchmarks (5) and enhance existing tasks (#372)

Commit

1 year ago

Add new Arabic benchmarks (5) and enhance existing tasks (#372) * Update arabic_evals.py Add new Arabic benchmarks and update existing tasks - Renamed `arabic_mmlu` to `arabic_mmlu_mt` to highlight its machine-translated origin. - Added new benchmarks: `arabic_mmlu` ArabicMMLU (https://arxiv.org/abs/2402.12840), `arabic_mmlu_ht` (human-translated), and `MadinahQA` from MBZUAI. As well as `arabic_mmmlu` (OpenAI MMMLU), and `AraTrust` a trustworthiness benchmark for Arabic LLMs (https://arxiv.org/abs/2403.09017). - Enhanced prompt functions for better flexibility in answer options. * Update and rename OALL_tasks.txt to OALL_v1_tasks.txt Rename file to refelect that it is v1 leaderboard tasks * Create OALL_v2_tasks.txt Tasks for v2 of OALL * Update all_arabic_tasks.txt add new and renamed tasks * Update arabic_evals.py Fix formatting issues for * Update all_arabic_tasks.txt Add missing task: OpenAI's MMMLU arabic subset * Update all_arabic_tasks.txt Correct order * Update arabic_evals.py remove openai mmmlu task following the discussion here: https://github.com/huggingface/lighteval/pull/372 * Update all_arabic_tasks.txt remove openai mmmlu task following the discussion here: https://github.com/huggingface/lighteval/pull/372 * Update tasks.py Adding a templated version of arabic mmlu based on @hynky1999 request in the #372 PR * Update tasks.py remove arabic_mmlu_templated_tasks --------- Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com> Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>

References

#372 - Add new Arabic benchmarks (5) and enhance existing tasks

Author

alielfilali01

Parents

6ad7276e

lighteval de8dba39 - Add new Arabic benchmarks (5) and enhance existing tasks (#372)

lighteval
de8dba39 - Add new Arabic benchmarks (5) and enhance existing tasks (#372)