Pull Requests huggingface/lighteval

Improve NarrativeQA metrics and prompt structure

#1147 opened 2026-01-22 19:40 by pjavanrood

Fix schema validation in olympiad_bench Doc.specific

#1145 opened 2026-01-22 19:40 by pjavanrood

Fix key mismatch and context access in PubMedQA

#1143 opened 2026-01-22 19:40 by pjavanrood

Fix TypeError in real_toxicity_prompts

#1141 opened 2026-01-22 19:39 by pjavanrood

Fix column mismatch and metric in SimpleQA

#1139 opened 2026-01-22 19:39 by pjavanrood

Fix subset names in StoryCloze

#1137 opened 2026-01-22 19:39 by pjavanrood

Fix Doc init and missing metadata in Summarization tasks

#1135 opened 2026-01-22 19:39 by pjavanrood

Fix hardcoded path in tiny_benchmarks

#1133 opened 2026-01-22 19:39 by pjavanrood

Fix KeyError in truthful_qa_generative_prompt

#1131 opened 2026-01-22 19:39 by pjavanrood

Fix MT-Bench multi-turn evaluation logic

#1129 opened 2026-01-22 19:39 by pjavanrood

Fix specific error in truthfulqa

#1127 opened 2026-01-22 06:22 by ChenZiHong-Gavin

Support for retriever-augmented models.

#1125 opened 2026-01-19 05:39 by akshathmangudi

Integrate alyah benchmark

#1117 opened 2026-01-12 06:13 by amztheorytii

When customizing the save path, modify the "save_details" location

#1092 opened 2025-11-29 09:23 by Guncuke

fix(tasks): print also tasks not prefixed by the suite name

#1087 opened 2025-11-27 10:56 by bram-pramono

[EVAL] SciCode new-task

#1086 opened 2025-11-27 08:02 by akshathmangudi

Evals on the hub

#1082 opened 2025-11-24 12:42 by NathanHB

Feature/tvd mi metric feature

#1080 opened 2025-11-22 00:27 by zrobertson466920

diskcache for caching breaking enhancement

#1068 opened 2025-11-19 10:29 by f14-bertolotti

graceful shutdown of vllm async bug

#1064 opened 2025-11-17 13:45 by f14-bertolotti

remove forbiden caracters in files, caches and details bug

#1062 opened 2025-11-17 10:08 by NathanHB

Adds Profbench new-task

#1041 opened 2025-11-06 12:49 by NathanHB

Fix PERPLEXITY task

#1037 opened 2025-11-04 19:26 by ScottHoang

Legal NLP tasks on Swiss data

#1032 opened 2025-10-31 17:54 by rolshoven

Add support to vllm==0.11.0

#1027 opened 2025-10-22 18:08 by anmarques

Fixes #1023: add custom processing logic for MetricGrouping

#1025 opened 2025-10-22 01:07 by colinzuo

Wrap vllm inputs to compatible with VLLM>=0.10.2

#1003 opened 2025-10-02 15:03 by JIElite

Fix caching logic

#994 opened 2025-09-25 22:05 by jxmorris12

Fix deberta overflow error bug

#990 opened 2025-09-24 07:14 by amstu2

run slow tests aginst vllm and transformers main

#985 opened 2025-09-23 08:55 by NathanHB