Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
huggingface/lighteval
Pull Requests
Commits
pr-756
0.9
66_chat_model_bug
73_MATH_bug
GSM8K_change_eos_condition
MATH_max_length_condition
NathanHB-patch-1
NathanHB-patch-2
NathanHB-patch-3
NathanHB-patch-4
NathanHB-patch-5
add_bbh
add_gen_tinybench
add_maj_at_k
add_mgsm
add_model_config_files
add_tinybenchs
add_torch_compile
add-gpqa-generative
add-gpt-4-judge
add-hynek
add-style-bot
adding_inference_endpoints
adding_musr
afric_tasks
aime-pass@k
albertvillanova-patch-1
arabic_evals
aut_chattemplate
baseline_model
belebele
bert_scorer_import
bill_ip_to_org
bug_fix_extractive_match
bump-dev-version
bump-dev-version0.9.1.dev0
bump-dev-version0.10.1.dev0
chat_template_parameters
chat_template_pram
cheaper_oai_model
clefourrier-patch-1
clefourrier-patch-2
clefourrier-patch-3
clefourrier-patch-4
clefourrier-patch-5
clefourrier-patch-6
clefourrier-patch-7
clefourrier-patch-8
clefourrier-patch-9
clefourrier-patch-10
clefourrier-patch-11
clefourrier-patch-xet
clefourrier-readme-install-step-update
clefourrier-vllm-10.2
clem_add_agieval
clem_add_gpqa
clem_add_pipeline
clem_async_vllm
clem_average_evals
clem_bnb_gptq_config_bug
clem_config_fallback
clem_custom_metrics
clem_custom_tasks_examples
clem_customizable_metrics
clem_details
clem_doc_readme
clem_dp_pp_vllm
clem_edit_README_harness
clem_extended_tasks_in_core
clem_fix_bs2
clem_fix_iemodel
clem_fix_import_check_gptq
clem_fix_max_len_gen
clem_fix_rolling
clem_fix_templates
clem_fix_999
clem_homogeneize_generation_params
clem_homogeneize_logging
clem_in_mem_model
clem_inference_endpoint_autoscale
clem_issue_templates
clem_last_exam
clem_metric_hynek
clem_mmlupro
clem_pass_at_k
clem_refacto_format_2
clem_refacto_prompt_management
clem_rm_default_config_task_eos_token
clem_support_extended_tasks
clem_test_fix
clem_test
clem_vllm_debug
clem-add-ifbench
clem-add-ifbench2
clem-fix-870
clem-fix-878
clem-fix-916
clem-skipping-broken-test
clementine_README_fun_widgets
config_templates_dev
config_templates
custom-tasks
data_split_depending_on_eval_params
dataset_fix
datasets_2.16_compatibility
debug_cb
debug_ll_evals
evaluation_tracker_fix
expose_details
extend-llm-judge
fast-ifeval
fine_tasks
fix_ifeval
fix_inits
fix_log_system_prompt
fix_padding_size_nanotron
fix_parallel_dataset_loading
fix_path
fix_prompt_name_mtbench
fix_ray
fix_semaphore_on_ip_calls
fix_tgi_config
fix_translation_literals
fix341
fix422
fix438
fix-brrr
fix-config-nanotron
fix-enumeration-yourbench-task
fix-global-mmlu
fix-hfh-inference-type
fix-ifeval-metric
fix-lcb-metric
fix-math
fix-math-chat
fix-mmlu-pro
fix-none-doc
fix-readme
fix-target-perplexity
fix-tests
fix-vllm-doc
fix-yourbench-task
fixing_task_list
fixxx-brrr
flores
geneartive_dynamic_metrics
generative_tasks
global_mmlu
gmmlu
hellaswag_tasks
hf-token-in-readme
hynek_function
iflores200
improve_chat_template
improve-llm-as-judge
inference_endpoints
instruction_chat_template
investigate_meg_inference_endpoint_bug
lazy_tests
lewtun/fix-vllm
lewtun-patch-1-1
lewtun-patch-1
lewtun-patch-2
license-pr
lighteval-experiment-setup
llama-base-ruler
logging_revamp
main
make_bleurt_lazy
math_extraction
math_normalization_crash
mcq-support-yourbench
meg-huggingface-patch-1
metrics_as_fn
mini-fixes
minifix_inf_endponts
misc_tasks
mixeval_dive
mmlu_pro
moar-judge-context
model_release
more_generative_tasks
multichoice_continuations_start_space_fix
multiif
multilang_copa_task
multilang_mqa_tasks
multilingual_math
multilingual_up
multilnag_nli_tasks
multilngual_math_rebased
nanotron_fix
nanotron-compatible
nanotron-tf-update
natha-fix-772
nathan_fix_push_details
nathan_fix_vllm
nathan-add-aime24-25
nathan-add-aimo
nathan-add-arc-agi-2
nathan-add-citations
nathan-add-cli-tool
nathan-add-closed-source
nathan-add-continious-batching
nathan-add-doc
nathan-add-inference-provider
nathan-add-integration-tests
nathan-add-integration-tests-2
nathan-add-integration-tests-4
nathan-add-judge-transformers
nathan-add-license
nathan-add-license-header
nathan-add-logging-to-metrics
nathan-add-mmlu-pro
nathan-add-model-as-judge-in-metrics
nathan-add-mt-bench
nathan-add-omniscience-public
nathan-add-openai-model
nathan-add-profbench
nathan-add-simpleqa
nathan-add-tests-for-metrics
nathan-add-to-inspect
nathan-add-trackio
nathan-adds-helet
nathan-adds-multimodal
nathan-adds-olympiad-bench
nathan-adds-wanddb-logging
nathan-are-tests-working
nathan-better-ci
nathan-better-doc
nathan-better-doc-inspect
nathan-better-readme
nathan-better-releasenotes
nathan-build-task-dump
nathan-bump-brrr-model
nathan-bump-git-python
nathan-bump-lighteval-version
nathan-bump-lighteval-version-0.4
nathan-bump-transformers
nathan-bump-v0.8.1
nathan-bump-version
nathan-bump-version-0.6-dev
nathan-change-dependencies
nathan-convert-to-inspect
nathan-deps-relax
nathan-diff-eval-set
nathan-eval-from-hub
nathan-fix-277
nathan-fix-297
nathan-fix-302
nathan-fix-447
nathan-fix-601
nathan-fix-668
nathan-fix-686
nathan-fix-725
nathan-fix-726
nathan-fix-752
nathan-fix-753
nathan-fix-757
nathan-fix-853
nathan-fix-855
nathan-fix-897
nathan-fix-910
nathan-fix-991
nathan-fix-brrr
nathan-fix-ci-for-fork
nathan-fix-deps
nathan-fix-details-to-str
nathan-fix-dtype
nathan-fix-extended-tasks
nathan-fix-lcb
nathan-fix-litellm
nathan-fix-litellm-tqdm
nathan-fix-llm-as-judge-warnings
nathan-fix-missing-json-file
nathan-fix-nanotron-1
nathan-fix-nltk
nathan-fix-sampling-evals
nathan-fix-slow-tests
nathan-fix-splits
nathan-fix-task-cli
nathan-fix-tasks
nathan-fix-typer
nathan-fix-vllm
nathan-fix-vllm-from-file
nathan-fix-workflow
nathan-forces-temperature-vllm
nathan-litellm-config-file
nathan-llm-judge-quickfix
nathan-log-model-config
nathan-move-to-inspectai
nathan-patch-0.9.1
nathan-patch-readme
nathan-prompt-object
nathan-readme-rewrite
nathan-reduce-cli-args-redundancy
nathan-refacto-cli
nathan-refacto-judge-and-add-mixeval
nathan-refacto-logging
nathan-refacto-typing
nathan-refactor-prompt-building
nathan-remove-forbiden-caracters
nathan-remove-suites
nathan-remove-think-tags-for-ifeval
nathan-reorder-authors
nathan-reorg-tasks
nathan-run-against-main
nathan-run-all-hf-providers
nathan-task-from-dataset
nathan-try-fix-vllm
nathan-unify-modelargs
nathan-update-doc
nathan-update-docs
nathan-update-ifeval-repo
nathan-use-inspect-ai
nathan-vllm-backend
nathan-vllm-fix-sampling-params-bug
nathan-vllm-max-model-size-fix
new-multi-lang-branch
new-multilingual-tasks
nouamane/quickfix-deps
numpy_dep
paloma
passAtK_math
patch_transformers_version
patch
paulinebm-patch-1
paulinebm-patch-2
piqa_edits
pr_sadra
pr-756
prob_metrics_and_more_norms
pull/372/head
quick_fix_vllm
qwen-ruler
rc_tasks
readme-small-fix
refacto_model
remove_tgi
remove_tgi_2
remove-deprecated-list-files-info
restore_target_perplexity_fix
revert-10-fix-target-perplexity
revert-295-config_templates
revert-651-improve-llm-as-judge
revert-655-nathan-better-ci
revert-842-moar-judge-context
revert-failed-merge
review_fixs
review
rework-imports
rework-suites
rm_latex_table
ruler-env-correct-sys
simplify_task_system
skip_tests_if_no_secrets
small_path_fix_for_cache
spacy_dep
standalon_nanotron_config
sync_math_verify
task_config
tasks_groups_fix
test_caching
test_cleaning_up
test_mmlu_redux_2
think
tk_skip
tokenization_fixes
tokenization_pair_encoding
translation_literals
translation_template
tune-pass-at-k
uncontam_exp
upd-nanotron
update-workflow-name
upgrade_deps
use_several_formats
v0.1-alpha
v0.2-alpha
v0.3-alpha
v0.4-alpha
v0.5-release
v0.6-release
v0.7-release
v0.8.0-release
v0.9-release
v0.10-release
v0.11-release
v0.12-release
v0.13-release
vllm_math_verify_fixes
vllm-fix-tokenizer-footgun
wandb-logging
#57_wikitext
#165
fix tests + make it pretty
hynky1999
committed
210 days ago
93a06bd3
fix token metrics, update nanotron to lighteval main
Hynek Kydlicek
committed
210 days ago
9ddd98f3
add new tasks
Hynek Kydlicek
committed
211 days ago
ff19cc84
Only use relevant parts of nanotron config / make qa evals cheaper for probs
Hynek Kydlicek
committed
211 days ago
60631af9
fixes
NouamaneTazi
committed
211 days ago
aca8ff4c
style
anton-l
committed
211 days ago
a9c31e80
style
anton-l
committed
211 days ago
3c9296ee
.
NouamaneTazi
committed
211 days ago
14e4b33d
.
NouamaneTazi
committed
211 days ago
4d37aa32
.
NouamaneTazi
committed
211 days ago
a4bff20b
ml task fixes
anton-l
committed
211 days ago
189e6443
nanotron updates
anton-l
committed
211 days ago
b820bde8
nanotron updates
anton-l
committed
211 days ago
38dcfd90
Adds multimodal support and MMMU pro (#675)
NathanHB
committed
211 days ago
Verified
1607dc10
Added Flores 200 (#717)
clefourrier
committed
211 days ago
Verified
63be4b0f
Update main_endpoint.py (#739)
clefourrier
committed
211 days ago
Verified
d18f11ab
fix litellm (#736)
NathanHB
committed
214 days ago
Verified
a5903760
Adds More Generative tasks (#694)
hynky1999
committed
214 days ago
Verified
c6d1231f
Update README.md (#733)
clefourrier
committed
215 days ago
Verified
f684d359
Fix revision arg for vLLM tokenizer (#721)
lewtun
committed
215 days ago
Verified
d3da6b9b
Added support for quantization in vLLM backend (#690)
SulRash
committed
218 days ago
Verified
04a74a28
Fix tqdm logging (#711)
Vectorrent
committed
218 days ago
Verified
f7392fa0
add livecodebench v6 (#712)
Cppowboy
committed
218 days ago
Verified
7477de09
Patch 0.9.1 (#708)
NathanHB
committed
223 days ago
Verified
c8714563
Bump dev version0.9.1.dev0 (#705)
NathanHB
committed
224 days ago
Verified
20cff959
better release notes (#704)
NathanHB
committed
224 days ago
Verified
42c9c61a
fix typos (#702)
omahs
committed
224 days ago
Verified
77efde06
Update README.md (#703)
NathanHB
committed
224 days ago
Verified
9bf210c0
[FIX] Inference providers (#701)
clefourrier
committed
224 days ago
Verified
039debce
docs: improve consistency in punctuation of metric list (#605)
mariagrandury
committed
224 days ago
Verified
40626e7a
Older