Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
huggingface/lighteval
Pull Requests
Commits
nathan-add-integration-tests-2
0.9
66_chat_model_bug
73_MATH_bug
GSM8K_change_eos_condition
MATH_max_length_condition
NathanHB-patch-1
NathanHB-patch-2
NathanHB-patch-3
NathanHB-patch-4
NathanHB-patch-5
add_bbh
add_gen_tinybench
add_maj_at_k
add_mgsm
add_model_config_files
add_tinybenchs
add_torch_compile
add-gpqa-generative
add-gpt-4-judge
add-hynek
add-style-bot
adding_inference_endpoints
adding_musr
afric_tasks
aime-pass@k
albertvillanova-patch-1
arabic_evals
aut_chattemplate
baseline_model
belebele
bert_scorer_import
bill_ip_to_org
bug_fix_extractive_match
bump-dev-version
bump-dev-version0.9.1.dev0
bump-dev-version0.10.1.dev0
chat_template_parameters
chat_template_pram
cheaper_oai_model
clefourrier-patch-1
clefourrier-patch-2
clefourrier-patch-3
clefourrier-patch-4
clefourrier-patch-5
clefourrier-patch-6
clefourrier-patch-7
clefourrier-patch-8
clefourrier-patch-9
clefourrier-patch-10
clefourrier-patch-11
clefourrier-patch-xet
clefourrier-readme-install-step-update
clefourrier-vllm-10.2
clem_add_agieval
clem_add_gpqa
clem_add_pipeline
clem_async_vllm
clem_average_evals
clem_bnb_gptq_config_bug
clem_config_fallback
clem_custom_metrics
clem_custom_tasks_examples
clem_customizable_metrics
clem_details
clem_doc_readme
clem_dp_pp_vllm
clem_edit_README_harness
clem_extended_tasks_in_core
clem_fix_bs2
clem_fix_iemodel
clem_fix_import_check_gptq
clem_fix_max_len_gen
clem_fix_rolling
clem_fix_templates
clem_fix_999
clem_homogeneize_generation_params
clem_homogeneize_logging
clem_in_mem_model
clem_inference_endpoint_autoscale
clem_issue_templates
clem_last_exam
clem_metric_hynek
clem_mmlupro
clem_pass_at_k
clem_refacto_format_2
clem_refacto_prompt_management
clem_rm_default_config_task_eos_token
clem_support_extended_tasks
clem_test_fix
clem_test
clem_vllm_debug
clem-add-ifbench
clem-add-ifbench2
clem-fix-870
clem-fix-878
clem-fix-916
clem-skipping-broken-test
clementine_README_fun_widgets
config_templates_dev
config_templates
custom-tasks
data_split_depending_on_eval_params
dataset_fix
datasets_2.16_compatibility
debug_cb
debug_ll_evals
evaluation_tracker_fix
expose_details
extend-llm-judge
fast-ifeval
fine_tasks
fix_ifeval
fix_inits
fix_log_system_prompt
fix_padding_size_nanotron
fix_parallel_dataset_loading
fix_path
fix_prompt_name_mtbench
fix_ray
fix_semaphore_on_ip_calls
fix_tgi_config
fix_translation_literals
fix341
fix422
fix438
fix-brrr
fix-config-nanotron
fix-enumeration-yourbench-task
fix-global-mmlu
fix-hfh-inference-type
fix-ifeval-metric
fix-lcb-metric
fix-math
fix-math-chat
fix-mmlu-pro
fix-none-doc
fix-readme
fix-target-perplexity
fix-tests
fix-vllm-doc
fix-yourbench-task
fixing_task_list
fixxx-brrr
flores
geneartive_dynamic_metrics
generative_tasks
global_mmlu
gmmlu
hellaswag_tasks
hf-token-in-readme
hynek_function
iflores200
improve_chat_template
improve-llm-as-judge
inference_endpoints
instruction_chat_template
investigate_meg_inference_endpoint_bug
lazy_tests
lewtun/fix-vllm
lewtun-patch-1-1
lewtun-patch-1
lewtun-patch-2
license-pr
lighteval-experiment-setup
llama-base-ruler
logging_revamp
main
make_bleurt_lazy
math_extraction
math_normalization_crash
mcq-support-yourbench
meg-huggingface-patch-1
metrics_as_fn
mini-fixes
minifix_inf_endponts
misc_tasks
mixeval_dive
mmlu_pro
moar-judge-context
model_release
more_generative_tasks
multichoice_continuations_start_space_fix
multiif
multilang_copa_task
multilang_mqa_tasks
multilingual_math
multilingual_up
multilnag_nli_tasks
multilngual_math_rebased
nanotron_fix
nanotron-compatible
nanotron-tf-update
natha-fix-772
nathan_fix_push_details
nathan_fix_vllm
nathan-add-aime24-25
nathan-add-aimo
nathan-add-arc-agi-2
nathan-add-citations
nathan-add-cli-tool
nathan-add-closed-source
nathan-add-continious-batching
nathan-add-doc
nathan-add-inference-provider
nathan-add-integration-tests
nathan-add-integration-tests-2
nathan-add-integration-tests-4
nathan-add-judge-transformers
nathan-add-license
nathan-add-license-header
nathan-add-logging-to-metrics
nathan-add-mmlu-pro
nathan-add-model-as-judge-in-metrics
nathan-add-mt-bench
nathan-add-omniscience-public
nathan-add-openai-model
nathan-add-profbench
nathan-add-simpleqa
nathan-add-tests-for-metrics
nathan-add-to-inspect
nathan-add-trackio
nathan-adds-helet
nathan-adds-multimodal
nathan-adds-olympiad-bench
nathan-adds-wanddb-logging
nathan-are-tests-working
nathan-better-ci
nathan-better-doc
nathan-better-doc-inspect
nathan-better-readme
nathan-better-releasenotes
nathan-build-task-dump
nathan-bump-brrr-model
nathan-bump-git-python
nathan-bump-lighteval-version
nathan-bump-lighteval-version-0.4
nathan-bump-transformers
nathan-bump-v0.8.1
nathan-bump-version
nathan-bump-version-0.6-dev
nathan-change-dependencies
nathan-convert-to-inspect
nathan-deps-relax
nathan-diff-eval-set
nathan-eval-from-hub
nathan-fix-277
nathan-fix-297
nathan-fix-302
nathan-fix-447
nathan-fix-601
nathan-fix-668
nathan-fix-686
nathan-fix-725
nathan-fix-726
nathan-fix-752
nathan-fix-753
nathan-fix-757
nathan-fix-853
nathan-fix-855
nathan-fix-897
nathan-fix-910
nathan-fix-991
nathan-fix-brrr
nathan-fix-ci-for-fork
nathan-fix-deps
nathan-fix-details-to-str
nathan-fix-dtype
nathan-fix-extended-tasks
nathan-fix-lcb
nathan-fix-litellm
nathan-fix-litellm-tqdm
nathan-fix-llm-as-judge-warnings
nathan-fix-missing-json-file
nathan-fix-nanotron-1
nathan-fix-nltk
nathan-fix-sampling-evals
nathan-fix-slow-tests
nathan-fix-splits
nathan-fix-task-cli
nathan-fix-tasks
nathan-fix-typer
nathan-fix-vllm
nathan-fix-vllm-from-file
nathan-fix-workflow
nathan-forces-temperature-vllm
nathan-litellm-config-file
nathan-llm-judge-quickfix
nathan-log-model-config
nathan-move-to-inspectai
nathan-patch-0.9.1
nathan-patch-readme
nathan-prompt-object
nathan-readme-rewrite
nathan-reduce-cli-args-redundancy
nathan-refacto-cli
nathan-refacto-judge-and-add-mixeval
nathan-refacto-logging
nathan-refacto-typing
nathan-refactor-prompt-building
nathan-remove-forbiden-caracters
nathan-remove-suites
nathan-remove-think-tags-for-ifeval
nathan-reorder-authors
nathan-reorg-tasks
nathan-run-against-main
nathan-run-all-hf-providers
nathan-task-from-dataset
nathan-try-fix-vllm
nathan-unify-modelargs
nathan-update-doc
nathan-update-docs
nathan-update-ifeval-repo
nathan-use-inspect-ai
nathan-vllm-backend
nathan-vllm-fix-sampling-params-bug
nathan-vllm-max-model-size-fix
new-multi-lang-branch
new-multilingual-tasks
nouamane/quickfix-deps
numpy_dep
paloma
passAtK_math
patch_transformers_version
patch
paulinebm-patch-1
paulinebm-patch-2
piqa_edits
pr_sadra
pr-756
prob_metrics_and_more_norms
pull/372/head
quick_fix_vllm
qwen-ruler
rc_tasks
readme-small-fix
refacto_model
remove_tgi
remove_tgi_2
remove-deprecated-list-files-info
restore_target_perplexity_fix
revert-10-fix-target-perplexity
revert-295-config_templates
revert-651-improve-llm-as-judge
revert-655-nathan-better-ci
revert-842-moar-judge-context
revert-failed-merge
review_fixs
review
rework-imports
rework-suites
rm_latex_table
ruler-env-correct-sys
simplify_task_system
skip_tests_if_no_secrets
small_path_fix_for_cache
spacy_dep
standalon_nanotron_config
sync_math_verify
task_config
tasks_groups_fix
test_caching
test_cleaning_up
test_mmlu_redux_2
think
tk_skip
tokenization_fixes
tokenization_pair_encoding
translation_literals
translation_template
tune-pass-at-k
uncontam_exp
upd-nanotron
update-workflow-name
upgrade_deps
use_several_formats
v0.1-alpha
v0.2-alpha
v0.3-alpha
v0.4-alpha
v0.5-release
v0.6-release
v0.7-release
v0.8.0-release
v0.9-release
v0.10-release
v0.11-release
v0.12-release
v0.13-release
vllm_math_verify_fixes
vllm-fix-tokenizer-footgun
wandb-logging
#57_wikitext
#165
fix vlm details
NathanHB
committed
90 days ago
5eba9f33
dont log to cli when doing slow tests and log nvidia smi
NathanHB
committed
90 days ago
b58b0635
Merge branch 'nathan-add-integration-tests' of github.com:huggingface/lighteval into nathan-add-integration-tests
NathanHB
committed
90 days ago
2f683f4a
use math.iscloze for metrics and fix path to vlm details
NathanHB
committed
90 days ago
c1af85a3
Update tests/slow_tests/sample_comparison.py
NathanHB
committed
90 days ago
4066608f
Update tests/slow_tests/sample_comparison.py
NathanHB
committed
90 days ago
d1d556d7
Apply suggestion from @Copilot
NathanHB
committed
90 days ago
e6f86b5f
Apply suggestion from @Copilot
NathanHB
committed
90 days ago
4dfcc18e
Apply suggestion from @Copilot
NathanHB
committed
90 days ago
291178df
Apply suggestion from @Copilot
NathanHB
committed
90 days ago
3dd4338e
only compare the text results
NathanHB
committed
90 days ago
6b6af7ad
add samples compare for vlm
NathanHB
committed
90 days ago
b2cce05f
compare logprobs ranking instead of values
NathanHB
committed
90 days ago
24a20c38
modify sample to have temp = 0
NathanHB
committed
90 days ago
1b39fcd1
get actual samples
NathanHB
committed
90 days ago
ecf14b9a
get actual samples
NathanHB
committed
90 days ago
a2d4267b
revert undeed changes
NathanHB
committed
91 days ago
88d03926
revert undeed changes
NathanHB
committed
91 days ago
cdc9f45d
fix logprobs compares for different harware
NathanHB
committed
91 days ago
6068ec89
fix logprobs compares for different harware
NathanHB
committed
92 days ago
6eb013c3
Merge branch 'nathan-add-integration-tests' of github.com:huggingface/lighteval into nathan-add-integration-tests
NathanHB
committed
92 days ago
8955542d
adding reference details
NathanHB
committed
92 days ago
3ae1c4be
working state
NathanHB
committed
92 days ago
70eeb9e8
Merge branch 'main' into nathan-add-integration-tests
NathanHB
committed
92 days ago
0a0f3cb9
working state
NathanHB
committed
92 days ago
3aefbcec
Add auto tests for metrics (#939)
NathanHB
committed
92 days ago
9d6b9126
compares sample to sample when doing slow tests
NathanHB
committed
93 days ago
0103a989
Add IFBench (#944)
clefourrier
committed
93 days ago
46663377
Added `backend_options` parameter to llm judges. (#963)
rolshoven
committed
93 days ago
d90e3a5f
Multilingual extractiveness (#956)
rolshoven
committed
93 days ago
96c2a4a5
Older