Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
huggingface/lighteval
Pull Requests
Commits
vllm_math_verify_fixes
0.9
66_chat_model_bug
73_MATH_bug
GSM8K_change_eos_condition
MATH_max_length_condition
NathanHB-patch-1
NathanHB-patch-2
NathanHB-patch-3
NathanHB-patch-4
NathanHB-patch-5
add_bbh
add_gen_tinybench
add_maj_at_k
add_mgsm
add_model_config_files
add_tinybenchs
add_torch_compile
add-gpqa-generative
add-gpt-4-judge
add-hynek
add-style-bot
adding_inference_endpoints
adding_musr
afric_tasks
aime-pass@k
albertvillanova-patch-1
arabic_evals
aut_chattemplate
baseline_model
belebele
bert_scorer_import
bill_ip_to_org
bug_fix_extractive_match
bump-dev-version
bump-dev-version0.9.1.dev0
bump-dev-version0.10.1.dev0
chat_template_parameters
chat_template_pram
cheaper_oai_model
clefourrier-patch-1
clefourrier-patch-2
clefourrier-patch-3
clefourrier-patch-4
clefourrier-patch-5
clefourrier-patch-6
clefourrier-patch-7
clefourrier-patch-8
clefourrier-patch-9
clefourrier-patch-10
clefourrier-patch-11
clefourrier-patch-xet
clefourrier-readme-install-step-update
clefourrier-vllm-10.2
clem_add_agieval
clem_add_gpqa
clem_add_pipeline
clem_async_vllm
clem_average_evals
clem_bnb_gptq_config_bug
clem_config_fallback
clem_custom_metrics
clem_custom_tasks_examples
clem_customizable_metrics
clem_details
clem_doc_readme
clem_dp_pp_vllm
clem_edit_README_harness
clem_extended_tasks_in_core
clem_fix_bs2
clem_fix_iemodel
clem_fix_import_check_gptq
clem_fix_max_len_gen
clem_fix_rolling
clem_fix_templates
clem_fix_999
clem_homogeneize_generation_params
clem_homogeneize_logging
clem_in_mem_model
clem_inference_endpoint_autoscale
clem_issue_templates
clem_last_exam
clem_metric_hynek
clem_mmlupro
clem_pass_at_k
clem_refacto_format_2
clem_refacto_prompt_management
clem_rm_default_config_task_eos_token
clem_support_extended_tasks
clem_test_fix
clem_test
clem_vllm_debug
clem-add-ifbench
clem-add-ifbench2
clem-fix-870
clem-fix-878
clem-fix-916
clem-skipping-broken-test
clementine_README_fun_widgets
config_templates_dev
config_templates
custom-tasks
data_split_depending_on_eval_params
dataset_fix
datasets_2.16_compatibility
debug_cb
debug_ll_evals
evaluation_tracker_fix
expose_details
extend-llm-judge
fast-ifeval
fine_tasks
fix_ifeval
fix_inits
fix_log_system_prompt
fix_padding_size_nanotron
fix_parallel_dataset_loading
fix_path
fix_prompt_name_mtbench
fix_ray
fix_semaphore_on_ip_calls
fix_tgi_config
fix_translation_literals
fix341
fix422
fix438
fix-brrr
fix-config-nanotron
fix-enumeration-yourbench-task
fix-global-mmlu
fix-hfh-inference-type
fix-ifeval-metric
fix-lcb-metric
fix-math
fix-math-chat
fix-mmlu-pro
fix-none-doc
fix-readme
fix-target-perplexity
fix-tests
fix-vllm-doc
fix-yourbench-task
fixing_task_list
fixxx-brrr
flores
geneartive_dynamic_metrics
generative_tasks
global_mmlu
gmmlu
hellaswag_tasks
hf-token-in-readme
hynek_function
iflores200
improve_chat_template
improve-llm-as-judge
inference_endpoints
instruction_chat_template
investigate_meg_inference_endpoint_bug
lazy_tests
lewtun/fix-vllm
lewtun-patch-1-1
lewtun-patch-1
lewtun-patch-2
license-pr
lighteval-experiment-setup
llama-base-ruler
logging_revamp
main
make_bleurt_lazy
math_extraction
math_normalization_crash
mcq-support-yourbench
meg-huggingface-patch-1
metrics_as_fn
mini-fixes
minifix_inf_endponts
misc_tasks
mixeval_dive
mmlu_pro
moar-judge-context
model_release
more_generative_tasks
multichoice_continuations_start_space_fix
multiif
multilang_copa_task
multilang_mqa_tasks
multilingual_math
multilingual_up
multilnag_nli_tasks
multilngual_math_rebased
nanotron_fix
nanotron-compatible
nanotron-tf-update
natha-fix-772
nathan_fix_push_details
nathan_fix_vllm
nathan-add-aime24-25
nathan-add-aimo
nathan-add-arc-agi-2
nathan-add-citations
nathan-add-cli-tool
nathan-add-closed-source
nathan-add-continious-batching
nathan-add-doc
nathan-add-inference-provider
nathan-add-integration-tests
nathan-add-integration-tests-2
nathan-add-integration-tests-4
nathan-add-judge-transformers
nathan-add-license
nathan-add-license-header
nathan-add-logging-to-metrics
nathan-add-mmlu-pro
nathan-add-model-as-judge-in-metrics
nathan-add-mt-bench
nathan-add-omniscience-public
nathan-add-openai-model
nathan-add-profbench
nathan-add-simpleqa
nathan-add-tests-for-metrics
nathan-add-to-inspect
nathan-add-trackio
nathan-adds-helet
nathan-adds-multimodal
nathan-adds-olympiad-bench
nathan-adds-wanddb-logging
nathan-are-tests-working
nathan-better-ci
nathan-better-doc
nathan-better-doc-inspect
nathan-better-readme
nathan-better-releasenotes
nathan-build-task-dump
nathan-bump-brrr-model
nathan-bump-git-python
nathan-bump-lighteval-version
nathan-bump-lighteval-version-0.4
nathan-bump-transformers
nathan-bump-v0.8.1
nathan-bump-version
nathan-bump-version-0.6-dev
nathan-change-dependencies
nathan-convert-to-inspect
nathan-deps-relax
nathan-diff-eval-set
nathan-eval-from-hub
nathan-fix-277
nathan-fix-297
nathan-fix-302
nathan-fix-447
nathan-fix-601
nathan-fix-668
nathan-fix-686
nathan-fix-725
nathan-fix-726
nathan-fix-752
nathan-fix-753
nathan-fix-757
nathan-fix-853
nathan-fix-855
nathan-fix-897
nathan-fix-910
nathan-fix-991
nathan-fix-brrr
nathan-fix-ci-for-fork
nathan-fix-deps
nathan-fix-details-to-str
nathan-fix-dtype
nathan-fix-extended-tasks
nathan-fix-lcb
nathan-fix-litellm
nathan-fix-litellm-tqdm
nathan-fix-llm-as-judge-warnings
nathan-fix-missing-json-file
nathan-fix-nanotron-1
nathan-fix-nltk
nathan-fix-sampling-evals
nathan-fix-slow-tests
nathan-fix-splits
nathan-fix-task-cli
nathan-fix-tasks
nathan-fix-typer
nathan-fix-vllm
nathan-fix-vllm-from-file
nathan-fix-workflow
nathan-forces-temperature-vllm
nathan-litellm-config-file
nathan-llm-judge-quickfix
nathan-log-model-config
nathan-move-to-inspectai
nathan-patch-0.9.1
nathan-patch-readme
nathan-prompt-object
nathan-readme-rewrite
nathan-reduce-cli-args-redundancy
nathan-refacto-cli
nathan-refacto-judge-and-add-mixeval
nathan-refacto-logging
nathan-refacto-typing
nathan-refactor-prompt-building
nathan-remove-forbiden-caracters
nathan-remove-suites
nathan-remove-think-tags-for-ifeval
nathan-reorder-authors
nathan-reorg-tasks
nathan-run-against-main
nathan-run-all-hf-providers
nathan-task-from-dataset
nathan-try-fix-vllm
nathan-unify-modelargs
nathan-update-doc
nathan-update-docs
nathan-update-ifeval-repo
nathan-use-inspect-ai
nathan-vllm-backend
nathan-vllm-fix-sampling-params-bug
nathan-vllm-max-model-size-fix
new-multi-lang-branch
new-multilingual-tasks
nouamane/quickfix-deps
numpy_dep
paloma
passAtK_math
patch_transformers_version
patch
paulinebm-patch-1
piqa_edits
pr_sadra
pr-756
prob_metrics_and_more_norms
pull/372/head
quick_fix_vllm
qwen-ruler
rc_tasks
readme-small-fix
refacto_model
remove_tgi
remove_tgi_2
remove-deprecated-list-files-info
restore_target_perplexity_fix
revert-10-fix-target-perplexity
revert-295-config_templates
revert-651-improve-llm-as-judge
revert-655-nathan-better-ci
revert-842-moar-judge-context
revert-failed-merge
review_fixs
review
rework-imports
rework-suites
rm_latex_table
ruler-env-correct-sys
simplify_task_system
skip_tests_if_no_secrets
small_path_fix_for_cache
spacy_dep
standalon_nanotron_config
sync_math_verify
task_config
tasks_groups_fix
test_caching
test_cleaning_up
test_mmlu_redux_2
think
tk_skip
tokenization_fixes
tokenization_pair_encoding
translation_literals
translation_template
tune-pass-at-k
uncontam_exp
upd-nanotron
update-workflow-name
upgrade_deps
use_several_formats
v0.1-alpha
v0.2-alpha
v0.3-alpha
v0.4-alpha
v0.5-release
v0.6-release
v0.7-release
v0.8.0-release
v0.9-release
v0.10-release
v0.11-release
v0.12-release
v0.13-release
vllm_math_verify_fixes
vllm-fix-tokenizer-footgun
wandb-logging
#57_wikitext
#165
Merge branch 'vllm_math_verify_fixes' of github.com:huggingface/lighteval into vllm_math_verify_fixes
Hynek Kydlicek
committed
285 days ago
3bb6d5e7
🥰 pretty 🥰
Hynek Kydlicek
committed
285 days ago
97515c13
Merge branch 'main' into vllm_math_verify_fixes
hynky1999
committed
285 days ago
Verified
068fdc06
nits
Hynek Kydlicek
committed
285 days ago
6be8eda1
Fixing backend error in main_sglang. (#597)
TankNee
committed
285 days ago
Verified
f2ddc520
Add subsets for lcb (#587)
plaguss
committed
291 days ago
Verified
ed084813
adds aime24, 25 and math500 (#586)
NathanHB
committed
292 days ago
Verified
4c9af85c
docs: update README to reflect new model evaluation entry points (#581)
czakop
committed
292 days ago
Verified
066f84f7
parse seed for vllm (#585)
eldarkurtic
committed
292 days ago
Verified
95068aa6
Push details without converting fields to str (#572)
NathanHB
committed
292 days ago
Verified
7b421132
Add turkish and word (#583)
bezir
committed
293 days ago
Verified
bd578a84
Fix vLLM generation with sampling params (#578)
lewtun
committed
296 days ago
Verified
ebb7377b
Humanity's last exam (#520)
clefourrier
committed
299 days ago
Verified
782afe89
Let lighteval support sglang (#552)
Jayon02
committed
299 days ago
Verified
086cf905
raise exception when generation size is more than model length (#571)
NathanHB
committed
299 days ago
Verified
bee02f7e
Add extended task for LiveCodeBench codegeneration (#548)
plaguss
committed
299 days ago
Verified
fd479ee6
allows better flexibility for litellm endpoints (#549)
NathanHB
committed
304 days ago
Verified
d6de1fe2
typo(vllm): `gpu_memory_utilisation` typo (#553)
tpoisonooo
committed
305 days ago
Verified
fac17bb6
[VLLM] Allows for max tokens to be set in model config file (#547)
NathanHB
committed
305 days ago
Verified
78b68abb
fix: broken URLs (#550)
deep-diver
committed
305 days ago
Verified
da119e81
Fix loading of vllm model from files (#533)
NathanHB
committed
307 days ago
Verified
d4e6f59b
Fix VLLM data-parallel (#541)
hynky1999
committed
311 days ago
Verified
86f62259
Bug fix extractive match (#540)
hynky1999
committed
311 days ago
Verified
3c9b0c9d
Update README.md (#539)
clefourrier
committed
311 days ago
Verified
f8405eee
Pass@k (#519)
clefourrier
committed
311 days ago
Verified
441d7a4a
Make BLEURT lazy (#536)
hynky1999
committed
311 days ago
Verified
15bdbb81
Add GPQA for instruct models (#534)
lewtun
committed
312 days ago
Verified
1ce7331f
Sync Math-verify (#535)
hynky1999
committed
312 days ago
Verified
cb35beae
Add custom task (bac-fr) for evaluation of models in french (#518)
mdiazmel
committed
314 days ago
Verified
d7a1f112
Update french_evals.py
clefourrier
committed
314 days ago
Verified
be7da176
Older