lighteval
eda7eed3 - MMLU Redux and Fixing the caching (#883)

Commit

96 days ago

MMLU Redux and Fixing the caching (#883) MMLU-Redux added, similar results to Qwen when using a generative metric. 3 changes to fix caching: removed tokenization saving system since it was unused and bloating the code added a hash for task configs, to make sure we actually compare generations from the same task version (for example, if you change task params it changes task hash). Side note: had to add a lot of str to get pretty prints for logged classes separates samples from loglikelihood metrics and samples from generative metrics

References

#883 - MMLU Redux and Fixing the caching

Author

clefourrier

Parents

7ed2636e

lighteval eda7eed3 - MMLU Redux and Fixing the caching (#883)

lighteval
eda7eed3 - MMLU Redux and Fixing the caching (#883)