lighteval
8c787df2 - Probability Metric + New Normalization (#276)

Commit

1 year ago

Probability Metric + New Normalization (#276) What does this implement/fix? Explain your changes. --------------------------------------------------- This PR adds two new features: 1) New Probability Metric, allowing to collect probability of correct answer. This can be either raw prob or prob mass (normalized by other choices) 2) Revamps Acc/Prob normalization and adds two new normalizations a) Token normalization, which we found to be better at most of the non-english langauges compared to acc norm. b) PointwiseMutualInformation normalization, which is good way for testing tasks with unlikely token see: https://arxiv.org/abs/2406.08446 Lastly I have done some small changes to the requests processing, removing parts, which are not needed and can easily cause bugs. Comments ---------- - I am not really content with having new category just for normalization but I didn't find a better way in the current system. The problem is that when creating requests we only have access to sample fc, but nothing else, thus we can't really do any kind of structural decomposition :( - This new norms are only added for non-single token types of tasks. Adding them to single token would require improving the requests creating logic to be maintanable and can be done in other PR PS: Relevant disscusion about token norm https://github.com/EleutherAI/lm-evaluation-harness/issues/1396 --------- Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co> Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>

References

#276 - Probability Metric + New Normalization

Author

hynky1999

Parents

cdeb6c2d

lighteval 8c787df2 - Probability Metric + New Normalization (#276)

lighteval
8c787df2 - Probability Metric + New Normalization (#276)