lighteval
eda7eed3 - MMLU Redux and Fixing the caching (#883)

Commit
96 days ago
MMLU Redux and Fixing the caching (#883) MMLU-Redux added, similar results to Qwen when using a generative metric. 3 changes to fix caching: removed tokenization saving system since it was unused and bloating the code added a hash for task configs, to make sure we actually compare generations from the same task version (for example, if you change task params it changes task hash). Side note: had to add a lot of str to get pretty prints for logged classes separates samples from loglikelihood metrics and samples from generative metrics
Author
Parents
Loading