lighteval
Add swiss legal evals as new community tasks
#389
Open

Add swiss legal evals as new community tasks #389

JoelNiklaus wants to merge 69 commits into huggingface:main from JoelNiklaus:add_swiss_legal_evals
JoelNiklaus
JoelNiklaus Add swiss legal evals as new community tasks
e2a27a72
clefourrier clefourrier requested a review from hynky1999 hynky1999 1 year ago
clefourrier
clefourrier
clefourrier commented on 2024-11-12
JoelNiklaus Removed nltk and numpy dependencies.
aa409c83
JoelNiklaus Added short dataset descriptions.
a8ee2a5c
hynky1999
JoelNiklaus
JoelNiklaus
clefourrier Merge branch 'main' into add_swiss_legal_evals
8f688444
clefourrier
JoelNiklaus
JoelNiklaus Removed open judge models and added COMET and METEOR.
c7f70380
clefourrier Merge branch 'main' into add_swiss_legal_evals
0ca5af6a
NathanHB
NathanHB commented on 2024-11-19
NathanHB Merge branch 'main' into add_swiss_legal_evals
1d51a01f
NathanHB
JoelNiklaus Ran pre-commit hooks.
5d41ce0a
JoelNiklaus
JoelNiklaus Changed prompt template.
81941254
JoelNiklaus Added legal translation specific judge prompt.
c58ae447
JoelNiklaus Improved judge prompt.
ff3705f9
JoelNiklaus Changed metric selection.
091ec113
JoelNiklaus Made generation_size dependent on the config.
5a479564
JoelNiklaus Fixed error in config.
6bf7fa24
JoelNiklaus Fixed error in config.
6cf1c2ac
JoelNiklaus Added support for multiple devices.
b5488017
JoelNiklaus Fixed some bugs for evaluation on GPUs.
ee2a83c0
JoelNiklaus Added batch inference for heavy metrics and multiplied each score by …
36b7e943
JoelNiklaus Added few shot examples and did some refactoring.
5ba218f8
JoelNiklaus Merge branch 'main' into add_swiss_legal_evals
8490841e
JoelNiklaus Switched to an own judge class.
576b847b
JoelNiklaus
clefourrier
JoelNiklaus
JoelNiklaus Fixed issue with judge metric not showing up in results.
41bb59ae
JoelNiklaus Fixed issue with evaluation on GPUs.
d82cd91a
JoelNiklaus Speed up metric computation on GPUs.
1b13d9fc
JoelNiklaus Added more logging.
df0f3f02
JoelNiklaus Switched to sample level scores for faster evaluation.
980c2571
JoelNiklaus Added rescale_with_baseline for BERTScore for better differentiation.
9a60dc0f
JoelNiklaus Merge branch 'main' into add_swiss_legal_evals
8c7814fc
JoelNiklaus Adapted metrics.
819b949c
JoelNiklaus Switched to sacrebleu implementation for sentence level translation m…
e758316f
JoelNiklaus Added more stop sequences.
d08163fa
JoelNiklaus Made stop_sequence level specific.
86c67bc3
JoelNiklaus Added gemba metric.
f1099455
JoelNiklaus Updated logging.
f357176e
JoelNiklaus Updated stop_sequence.
2d4c0ed8
JoelNiklaus Merge branch 'main' into add_swiss_legal_evals
44ad734c
JoelNiklaus Made metric selection easier.
7b779727
JoelNiklaus Fixed dict issue.
fcd95052
JoelNiklaus Added metric dependencies.
5a8ca464
JoelNiklaus Moving metrics to extended tasks.
bab94af4
JoelNiklaus Merge branch 'main' into add_swiss_legal_evals
37468493
JoelNiklaus Merge branch 'main' into add_swiss_legal_evals
ddaadbf2
JoelNiklaus Added support for judges from different providers.
09be56d8
JoelNiklaus Added additional system and user prompts and few shot examples.
0aa86077
JoelNiklaus Removed debug relics.
c49e1e23
JoelNiklaus Fixed issue in judge prompt.
4418e82b
JoelNiklaus Adapted getting predictions to new way for all metrics.
075ebd2e
JoelNiklaus Added gemba mqm metric by default.
8ee2dbc7
JoelNiklaus Fixed error in gemba score when errors are no dicts.
4408d0d0
JoelNiklaus Added different judge configurations for gpt 4o.
be6d9abe
JoelNiklaus Fixed typo.
c7ca83f5
JoelNiklaus Disabled short metrics for evaluation of longer sequences.
930cbc57
JoelNiklaus Added xcomet metrics to sentence level metrics.
61058b16
JoelNiklaus Fixed error in bleurt and enabled lazy loading of metrics to save on …
e043ee81
JoelNiklaus Refactored judge metric creation.
1c38c0ab
JoelNiklaus Improved detailed judge prompt and changed secondary judge models fro…
e05ac6a0
JoelNiklaus Changed judge order.
0aed0632
JoelNiklaus Merge branch 'main' into add_swiss_legal_evals
d9078a7f
JoelNiklaus Fixed stop sequence issue in press releases.
46eb62ae
JoelNiklaus Fixed error in xcomet scores.
a78bc03b
JoelNiklaus Made metric groups more easily configurable.
f6b50b4c
JoelNiklaus Made comet score more robust.
7f36065c
JoelNiklaus Moved unpack to the pipeline code.
cb6bfb41
rolshoven Merge branch 'huggingface:main' into add_swiss_legal_evals
306ee766
JoelNiklaus Fixed bug in comet score.
866e7708
JoelNiklaus Added additional judge prompt configurations.
e7f9a096
JoelNiklaus Fixed judge setup.
186a6c83
JoelNiklaus Added more judge models.
c62647e8
NathanHB
JoelNiklaus
NathanHB
JoelNiklaus Made the best judge the default.
2610c920

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone