lighteval
8c787df2 - Probability Metric + New Normalization (#276)

Commit
1 year ago
Probability Metric + New Normalization (#276) What does this implement/fix? Explain your changes. --------------------------------------------------- This PR adds two new features: 1) New Probability Metric, allowing to collect probability of correct answer. This can be either raw prob or prob mass (normalized by other choices) 2) Revamps Acc/Prob normalization and adds two new normalizations a) Token normalization, which we found to be better at most of the non-english langauges compared to acc norm. b) PointwiseMutualInformation normalization, which is good way for testing tasks with unlikely token see: https://arxiv.org/abs/2406.08446 Lastly I have done some small changes to the requests processing, removing parts, which are not needed and can easily cause bugs. Comments ---------- - I am not really content with having new category just for normalization but I didn't find a better way in the current system. The problem is that when creating requests we only have access to sample fc, but nothing else, thus we can't really do any kind of structural decomposition :( - This new norms are only added for non-single token types of tasks. Adding them to single token would require improving the requests creating logic to be maintanable and can be done in other PR PS: Relevant disscusion about token norm https://github.com/EleutherAI/lm-evaluation-harness/issues/1396 --------- Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co> Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>
Author
Parents
Loading