Update arabic_evals.py - SemanticDiff

Commit

1 year ago

Update arabic_evals.py Add new Arabic benchmarks and update existing tasks - Renamed `arabic_mmlu` to `arabic_mmlu_mt` to highlight its machine-translated origin. - Added new benchmarks: `arabic_mmlu` ArabicMMLU (https://arxiv.org/abs/2402.12840), `arabic_mmlu_ht` (human-translated), and `MadinahQA` from MBZUAI. As well as `arabic_mmmlu` (OpenAI MMMLU), and `AraTrust` a trustworthiness benchmark for Arabic LLMs (https://arxiv.org/abs/2403.09017). - Enhanced prompt functions for better flexibility in answer options.

References

#372 - Add new Arabic benchmarks (5) and enhance existing tasks

Author

alielfilali01

Parents

7fe7b12d

lighteval f3010c17 - Update arabic_evals.py

lighteval
f3010c17 - Update arabic_evals.py