Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
huggingface/lighteval
Pull Requests
Commits
nathan-bump-lighteval-version
0.9
66_chat_model_bug
73_MATH_bug
GSM8K_change_eos_condition
MATH_max_length_condition
NathanHB-patch-1
NathanHB-patch-2
NathanHB-patch-3
NathanHB-patch-4
NathanHB-patch-5
add_bbh
add_gen_tinybench
add_maj_at_k
add_mgsm
add_model_config_files
add_tinybenchs
add_torch_compile
add-gpqa-generative
add-gpt-4-judge
add-hynek
add-style-bot
adding_inference_endpoints
adding_musr
afric_tasks
aime-pass@k
albertvillanova-patch-1
arabic_evals
aut_chattemplate
baseline_model
belebele
bert_scorer_import
bill_ip_to_org
bug_fix_extractive_match
bump-dev-version
bump-dev-version0.9.1.dev0
bump-dev-version0.10.1.dev0
chat_template_parameters
chat_template_pram
cheaper_oai_model
clefourrier-patch-1
clefourrier-patch-2
clefourrier-patch-3
clefourrier-patch-4
clefourrier-patch-5
clefourrier-patch-6
clefourrier-patch-7
clefourrier-patch-8
clefourrier-patch-9
clefourrier-patch-10
clefourrier-patch-11
clefourrier-patch-xet
clefourrier-readme-install-step-update
clefourrier-vllm-10.2
clem_add_agieval
clem_add_gpqa
clem_add_pipeline
clem_async_vllm
clem_average_evals
clem_bnb_gptq_config_bug
clem_config_fallback
clem_custom_metrics
clem_custom_tasks_examples
clem_customizable_metrics
clem_details
clem_doc_readme
clem_dp_pp_vllm
clem_edit_README_harness
clem_extended_tasks_in_core
clem_fix_bs2
clem_fix_iemodel
clem_fix_import_check_gptq
clem_fix_max_len_gen
clem_fix_rolling
clem_fix_templates
clem_fix_999
clem_homogeneize_generation_params
clem_homogeneize_logging
clem_in_mem_model
clem_inference_endpoint_autoscale
clem_issue_templates
clem_last_exam
clem_metric_hynek
clem_mmlupro
clem_pass_at_k
clem_refacto_format_2
clem_refacto_prompt_management
clem_rm_default_config_task_eos_token
clem_support_extended_tasks
clem_test_fix
clem_test
clem_vllm_debug
clem-add-ifbench
clem-add-ifbench2
clem-fix-870
clem-fix-878
clem-fix-916
clem-skipping-broken-test
clementine_README_fun_widgets
config_templates_dev
config_templates
custom-tasks
data_split_depending_on_eval_params
dataset_fix
datasets_2.16_compatibility
debug_cb
debug_ll_evals
evaluation_tracker_fix
expose_details
extend-llm-judge
fast-ifeval
fine_tasks
fix_ifeval
fix_inits
fix_log_system_prompt
fix_padding_size_nanotron
fix_parallel_dataset_loading
fix_path
fix_prompt_name_mtbench
fix_ray
fix_semaphore_on_ip_calls
fix_tgi_config
fix_translation_literals
fix341
fix422
fix438
fix-brrr
fix-config-nanotron
fix-enumeration-yourbench-task
fix-global-mmlu
fix-hfh-inference-type
fix-ifeval-metric
fix-lcb-metric
fix-math
fix-math-chat
fix-mmlu-pro
fix-none-doc
fix-readme
fix-target-perplexity
fix-tests
fix-vllm-doc
fix-yourbench-task
fixing_task_list
fixxx-brrr
flores
geneartive_dynamic_metrics
generative_tasks
global_mmlu
gmmlu
hellaswag_tasks
hf-token-in-readme
hynek_function
iflores200
improve_chat_template
improve-llm-as-judge
inference_endpoints
instruction_chat_template
investigate_meg_inference_endpoint_bug
lazy_tests
lewtun/fix-vllm
lewtun-patch-1-1
lewtun-patch-1
lewtun-patch-2
license-pr
lighteval-experiment-setup
llama-base-ruler
logging_revamp
main
make_bleurt_lazy
math_extraction
math_normalization_crash
mcq-support-yourbench
meg-huggingface-patch-1
metrics_as_fn
mini-fixes
minifix_inf_endponts
misc_tasks
mixeval_dive
mmlu_pro
moar-judge-context
model_release
more_generative_tasks
multichoice_continuations_start_space_fix
multiif
multilang_copa_task
multilang_mqa_tasks
multilingual_math
multilingual_up
multilnag_nli_tasks
multilngual_math_rebased
nanotron_fix
nanotron-compatible
nanotron-tf-update
natha-fix-772
nathan_fix_push_details
nathan_fix_vllm
nathan-add-aime24-25
nathan-add-aimo
nathan-add-arc-agi-2
nathan-add-citations
nathan-add-cli-tool
nathan-add-closed-source
nathan-add-continious-batching
nathan-add-doc
nathan-add-inference-provider
nathan-add-integration-tests
nathan-add-integration-tests-2
nathan-add-integration-tests-4
nathan-add-judge-transformers
nathan-add-license
nathan-add-license-header
nathan-add-logging-to-metrics
nathan-add-mmlu-pro
nathan-add-model-as-judge-in-metrics
nathan-add-mt-bench
nathan-add-omniscience-public
nathan-add-openai-model
nathan-add-profbench
nathan-add-simpleqa
nathan-add-tests-for-metrics
nathan-add-to-inspect
nathan-add-trackio
nathan-adds-helet
nathan-adds-multimodal
nathan-adds-olympiad-bench
nathan-adds-wanddb-logging
nathan-are-tests-working
nathan-better-ci
nathan-better-doc
nathan-better-doc-inspect
nathan-better-readme
nathan-better-releasenotes
nathan-build-task-dump
nathan-bump-brrr-model
nathan-bump-git-python
nathan-bump-lighteval-version
nathan-bump-lighteval-version-0.4
nathan-bump-transformers
nathan-bump-v0.8.1
nathan-bump-version
nathan-bump-version-0.6-dev
nathan-change-dependencies
nathan-convert-to-inspect
nathan-deps-relax
nathan-diff-eval-set
nathan-eval-from-hub
nathan-fix-277
nathan-fix-297
nathan-fix-302
nathan-fix-447
nathan-fix-601
nathan-fix-668
nathan-fix-686
nathan-fix-725
nathan-fix-726
nathan-fix-752
nathan-fix-753
nathan-fix-757
nathan-fix-853
nathan-fix-855
nathan-fix-897
nathan-fix-910
nathan-fix-991
nathan-fix-brrr
nathan-fix-ci-for-fork
nathan-fix-deps
nathan-fix-details-to-str
nathan-fix-dtype
nathan-fix-extended-tasks
nathan-fix-lcb
nathan-fix-litellm
nathan-fix-litellm-tqdm
nathan-fix-llm-as-judge-warnings
nathan-fix-missing-json-file
nathan-fix-nanotron-1
nathan-fix-nltk
nathan-fix-sampling-evals
nathan-fix-slow-tests
nathan-fix-splits
nathan-fix-task-cli
nathan-fix-tasks
nathan-fix-typer
nathan-fix-vllm
nathan-fix-vllm-from-file
nathan-fix-workflow
nathan-forces-temperature-vllm
nathan-litellm-config-file
nathan-llm-judge-quickfix
nathan-log-model-config
nathan-move-to-inspectai
nathan-patch-0.9.1
nathan-patch-readme
nathan-prompt-object
nathan-readme-rewrite
nathan-reduce-cli-args-redundancy
nathan-refacto-cli
nathan-refacto-judge-and-add-mixeval
nathan-refacto-logging
nathan-refacto-typing
nathan-refactor-prompt-building
nathan-remove-forbiden-caracters
nathan-remove-suites
nathan-remove-think-tags-for-ifeval
nathan-reorder-authors
nathan-reorg-tasks
nathan-run-against-main
nathan-run-all-hf-providers
nathan-task-from-dataset
nathan-try-fix-vllm
nathan-unify-modelargs
nathan-update-doc
nathan-update-docs
nathan-update-ifeval-repo
nathan-use-inspect-ai
nathan-vllm-backend
nathan-vllm-fix-sampling-params-bug
nathan-vllm-max-model-size-fix
new-multi-lang-branch
new-multilingual-tasks
nouamane/quickfix-deps
numpy_dep
paloma
passAtK_math
patch_transformers_version
patch
paulinebm-patch-1
piqa_edits
pr_sadra
pr-756
prob_metrics_and_more_norms
pull/372/head
quick_fix_vllm
qwen-ruler
rc_tasks
readme-small-fix
refacto_model
remove_tgi
remove_tgi_2
remove-deprecated-list-files-info
restore_target_perplexity_fix
revert-10-fix-target-perplexity
revert-295-config_templates
revert-651-improve-llm-as-judge
revert-655-nathan-better-ci
revert-842-moar-judge-context
revert-failed-merge
review_fixs
review
rework-imports
rework-suites
rm_latex_table
ruler-env-correct-sys
simplify_task_system
skip_tests_if_no_secrets
small_path_fix_for_cache
spacy_dep
standalon_nanotron_config
sync_math_verify
task_config
tasks_groups_fix
test_caching
test_cleaning_up
test_mmlu_redux_2
think
tk_skip
tokenization_fixes
tokenization_pair_encoding
translation_literals
translation_template
tune-pass-at-k
uncontam_exp
upd-nanotron
update-workflow-name
upgrade_deps
use_several_formats
v0.1-alpha
v0.2-alpha
v0.3-alpha
v0.4-alpha
v0.5-release
v0.6-release
v0.7-release
v0.8.0-release
v0.9-release
v0.10-release
v0.11-release
v0.12-release
v0.13-release
vllm_math_verify_fixes
vllm-fix-tokenizer-footgun
wandb-logging
#57_wikitext
#165
Merge branch 'main' into nathan-bump-lighteval-version
clefourrier
committed
1 year ago
Verified
3f2e90a1
Change the eos condition for GSM8K (#85)
clefourrier
committed
1 year ago
Verified
9b3813ff
Fix parallel data processing bug (#92)
clefourrier
committed
1 year ago
Verified
3b0aa23d
add license header to src files (#89)
NathanHB
committed
1 year ago
Verified
2d529ac8
Sets a max length for the MATH task (#83)
clefourrier
committed
1 year ago
Verified
458d50b0
bump git python (#90)
NathanHB
committed
1 year ago
Verified
5ba92603
update pyproject ruff config to match new version
Nathan Habib
committed
1 year ago
410fd908
Tidy up dependency groups (#81)
lewtun
committed
1 year ago
Verified
7bf40877
Create LICENSE (#86)
clefourrier
committed
1 year ago
Verified
927e63ef
Merge branch 'main' into nathan-bump-lighteval-version
NathanHB
committed
1 year ago
Verified
d22538ed
Upgrade huggingface_hub to fix datasets import and add trust_remote_code in datasets (#84)
clefourrier
committed
1 year ago
Verified
9ecab065
Release: v0.2.0
Nathan Habib
committed
1 year ago
bc7a3dbe
Relax sentencepiece version (#74)
lewtun
committed
1 year ago
Verified
b9d02770
Update ruff (#71)
clefourrier
committed
1 year ago
Verified
030945b1
Now manages no generation size is set in a generative task description (#76)
clefourrier
committed
1 year ago
Verified
e49585da
Fixes wikitext prompts + some patches on tg models (#64)
clefourrier
committed
1 year ago
Verified
cabef7c4
Adding custom metric system + IFEval as an example (#48)
clefourrier
committed
1 year ago
Verified
acffc1a8
Just adding the custom metrics system (#65)
clefourrier
committed
1 year ago
Verified
3785d852
Fixes chat template application to choices (#67)
clefourrier
committed
1 year ago
Verified
49074998
Remove the eos token override in the Default Config Task (#54)
clefourrier
committed
1 year ago
Verified
449817f6
update (#58)
thomwolf
committed
1 year ago
Verified
589e6b0d
Update leaderboard task set (#60)
lewtun
committed
1 year ago
Verified
6a3e3b92
Tweak installation / usage sections of README (#55)
lewtun
committed
1 year ago
Verified
480d85ef
Adding support for Arabic benchmarks : AceGPT benchmarking suite (#44)
alielfilali01
committed
1 year ago
Verified
090101f1
New mechanism for evaluation contributions (#47)
clefourrier
committed
1 year ago
Verified
92e9b505
Add GPQA (#42)
clefourrier
committed
1 year ago
Verified
831ad47b
Improve the current chat template system (#38)
clefourrier
committed
1 year ago
Verified
81fc8fda
bump transformers to 4.38 (#46)
NathanHB
committed
1 year ago
Verified
fb57ffc6
Add an automatic system to compute average for tasks with subtasks
clefourrier
committed
1 year ago
Verified
62abc78c
Update README.md (#37)
clefourrier
committed
1 year ago
Verified
77c20164
Older