Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
huggingface/lighteval
Pull Requests
Commits
test_caching
0.9
66_chat_model_bug
73_MATH_bug
GSM8K_change_eos_condition
MATH_max_length_condition
NathanHB-patch-1
NathanHB-patch-2
NathanHB-patch-3
NathanHB-patch-4
NathanHB-patch-5
add_bbh
add_gen_tinybench
add_maj_at_k
add_mgsm
add_model_config_files
add_tinybenchs
add_torch_compile
add-gpqa-generative
add-gpt-4-judge
add-hynek
add-style-bot
adding_inference_endpoints
adding_musr
afric_tasks
aime-pass@k
albertvillanova-patch-1
arabic_evals
aut_chattemplate
baseline_model
belebele
bert_scorer_import
bill_ip_to_org
bug_fix_extractive_match
bump-dev-version
bump-dev-version0.9.1.dev0
bump-dev-version0.10.1.dev0
chat_template_parameters
chat_template_pram
cheaper_oai_model
clefourrier-patch-1
clefourrier-patch-2
clefourrier-patch-3
clefourrier-patch-4
clefourrier-patch-5
clefourrier-patch-6
clefourrier-patch-7
clefourrier-patch-8
clefourrier-patch-9
clefourrier-patch-10
clefourrier-patch-11
clefourrier-patch-xet
clefourrier-readme-install-step-update
clefourrier-vllm-10.2
clem_add_agieval
clem_add_gpqa
clem_add_pipeline
clem_async_vllm
clem_average_evals
clem_bnb_gptq_config_bug
clem_config_fallback
clem_custom_metrics
clem_custom_tasks_examples
clem_customizable_metrics
clem_details
clem_doc_readme
clem_dp_pp_vllm
clem_edit_README_harness
clem_extended_tasks_in_core
clem_fix_bs2
clem_fix_iemodel
clem_fix_import_check_gptq
clem_fix_max_len_gen
clem_fix_rolling
clem_fix_templates
clem_fix_999
clem_homogeneize_generation_params
clem_homogeneize_logging
clem_in_mem_model
clem_inference_endpoint_autoscale
clem_issue_templates
clem_last_exam
clem_metric_hynek
clem_mmlupro
clem_pass_at_k
clem_refacto_format_2
clem_refacto_prompt_management
clem_rm_default_config_task_eos_token
clem_support_extended_tasks
clem_test_fix
clem_test
clem_vllm_debug
clem-add-ifbench
clem-add-ifbench2
clem-fix-870
clem-fix-878
clem-fix-916
clem-skipping-broken-test
clementine_README_fun_widgets
config_templates_dev
config_templates
custom-tasks
data_split_depending_on_eval_params
dataset_fix
datasets_2.16_compatibility
debug_cb
debug_ll_evals
evaluation_tracker_fix
expose_details
extend-llm-judge
fast-ifeval
fine_tasks
fix_ifeval
fix_inits
fix_log_system_prompt
fix_padding_size_nanotron
fix_parallel_dataset_loading
fix_path
fix_prompt_name_mtbench
fix_ray
fix_semaphore_on_ip_calls
fix_tgi_config
fix_translation_literals
fix341
fix422
fix438
fix-brrr
fix-config-nanotron
fix-enumeration-yourbench-task
fix-global-mmlu
fix-hfh-inference-type
fix-ifeval-metric
fix-lcb-metric
fix-math
fix-math-chat
fix-mmlu-pro
fix-none-doc
fix-readme
fix-target-perplexity
fix-tests
fix-vllm-doc
fix-yourbench-task
fixing_task_list
fixxx-brrr
flores
geneartive_dynamic_metrics
generative_tasks
global_mmlu
gmmlu
hellaswag_tasks
hf-token-in-readme
hynek_function
iflores200
improve_chat_template
improve-llm-as-judge
inference_endpoints
instruction_chat_template
investigate_meg_inference_endpoint_bug
lazy_tests
lewtun/fix-vllm
lewtun-patch-1-1
lewtun-patch-1
lewtun-patch-2
license-pr
lighteval-experiment-setup
llama-base-ruler
logging_revamp
main
make_bleurt_lazy
math_extraction
math_normalization_crash
mcq-support-yourbench
meg-huggingface-patch-1
metrics_as_fn
mini-fixes
minifix_inf_endponts
misc_tasks
mixeval_dive
mmlu_pro
moar-judge-context
model_release
more_generative_tasks
multichoice_continuations_start_space_fix
multiif
multilang_copa_task
multilang_mqa_tasks
multilingual_math
multilingual_up
multilnag_nli_tasks
multilngual_math_rebased
nanotron_fix
nanotron-compatible
nanotron-tf-update
natha-fix-772
nathan_fix_push_details
nathan_fix_vllm
nathan-add-aime24-25
nathan-add-aimo
nathan-add-arc-agi-2
nathan-add-citations
nathan-add-cli-tool
nathan-add-closed-source
nathan-add-continious-batching
nathan-add-doc
nathan-add-inference-provider
nathan-add-integration-tests
nathan-add-integration-tests-2
nathan-add-integration-tests-4
nathan-add-judge-transformers
nathan-add-license
nathan-add-license-header
nathan-add-logging-to-metrics
nathan-add-mmlu-pro
nathan-add-model-as-judge-in-metrics
nathan-add-mt-bench
nathan-add-omniscience-public
nathan-add-openai-model
nathan-add-profbench
nathan-add-simpleqa
nathan-add-tests-for-metrics
nathan-add-to-inspect
nathan-add-trackio
nathan-adds-helet
nathan-adds-multimodal
nathan-adds-olympiad-bench
nathan-adds-wanddb-logging
nathan-are-tests-working
nathan-better-ci
nathan-better-doc
nathan-better-doc-inspect
nathan-better-readme
nathan-better-releasenotes
nathan-build-task-dump
nathan-bump-brrr-model
nathan-bump-git-python
nathan-bump-lighteval-version
nathan-bump-lighteval-version-0.4
nathan-bump-transformers
nathan-bump-v0.8.1
nathan-bump-version
nathan-bump-version-0.6-dev
nathan-change-dependencies
nathan-convert-to-inspect
nathan-deps-relax
nathan-diff-eval-set
nathan-eval-from-hub
nathan-fix-277
nathan-fix-297
nathan-fix-302
nathan-fix-447
nathan-fix-601
nathan-fix-668
nathan-fix-686
nathan-fix-725
nathan-fix-726
nathan-fix-752
nathan-fix-753
nathan-fix-757
nathan-fix-853
nathan-fix-855
nathan-fix-897
nathan-fix-910
nathan-fix-991
nathan-fix-brrr
nathan-fix-ci-for-fork
nathan-fix-deps
nathan-fix-details-to-str
nathan-fix-dtype
nathan-fix-extended-tasks
nathan-fix-lcb
nathan-fix-litellm
nathan-fix-litellm-tqdm
nathan-fix-llm-as-judge-warnings
nathan-fix-missing-json-file
nathan-fix-nanotron-1
nathan-fix-nltk
nathan-fix-sampling-evals
nathan-fix-slow-tests
nathan-fix-splits
nathan-fix-task-cli
nathan-fix-tasks
nathan-fix-typer
nathan-fix-vllm
nathan-fix-vllm-from-file
nathan-fix-workflow
nathan-forces-temperature-vllm
nathan-litellm-config-file
nathan-llm-judge-quickfix
nathan-log-model-config
nathan-move-to-inspectai
nathan-patch-0.9.1
nathan-patch-readme
nathan-prompt-object
nathan-readme-rewrite
nathan-reduce-cli-args-redundancy
nathan-refacto-cli
nathan-refacto-judge-and-add-mixeval
nathan-refacto-logging
nathan-refacto-typing
nathan-refactor-prompt-building
nathan-remove-forbiden-caracters
nathan-remove-suites
nathan-remove-think-tags-for-ifeval
nathan-reorder-authors
nathan-reorg-tasks
nathan-run-against-main
nathan-run-all-hf-providers
nathan-task-from-dataset
nathan-try-fix-vllm
nathan-unify-modelargs
nathan-update-doc
nathan-update-docs
nathan-update-ifeval-repo
nathan-use-inspect-ai
nathan-vllm-backend
nathan-vllm-fix-sampling-params-bug
nathan-vllm-max-model-size-fix
new-multi-lang-branch
new-multilingual-tasks
nouamane/quickfix-deps
numpy_dep
paloma
passAtK_math
patch_transformers_version
patch
paulinebm-patch-1
piqa_edits
pr_sadra
pr-756
prob_metrics_and_more_norms
pull/372/head
quick_fix_vllm
qwen-ruler
rc_tasks
readme-small-fix
refacto_model
remove_tgi
remove_tgi_2
remove-deprecated-list-files-info
restore_target_perplexity_fix
revert-10-fix-target-perplexity
revert-295-config_templates
revert-651-improve-llm-as-judge
revert-655-nathan-better-ci
revert-842-moar-judge-context
revert-failed-merge
review_fixs
review
rework-imports
rework-suites
rm_latex_table
ruler-env-correct-sys
simplify_task_system
skip_tests_if_no_secrets
small_path_fix_for_cache
spacy_dep
standalon_nanotron_config
sync_math_verify
task_config
tasks_groups_fix
test_caching
test_cleaning_up
test_mmlu_redux_2
think
tk_skip
tokenization_fixes
tokenization_pair_encoding
translation_literals
translation_template
tune-pass-at-k
uncontam_exp
upd-nanotron
update-workflow-name
upgrade_deps
use_several_formats
v0.1-alpha
v0.2-alpha
v0.3-alpha
v0.4-alpha
v0.5-release
v0.6-release
v0.7-release
v0.8.0-release
v0.9-release
v0.10-release
v0.11-release
v0.12-release
v0.13-release
vllm_math_verify_fixes
vllm-fix-tokenizer-footgun
wandb-logging
#57_wikitext
#165
fix tests
clefourrier
committed
125 days ago
ce0015e2
we mock iep model creation
clefourrier
committed
125 days ago
35b7e94e
added doc page to tree
clefourrier
committed
125 days ago
730f226a
Merge branch 'main' into test_caching
clefourrier
committed
125 days ago
Verified
f4d5b145
we separate the interface function from the actual logic in models, and wrap up tests
clefourrier
committed
125 days ago
d2ff97c0
Fix vLLM doc (#912)
lewtun
committed
125 days ago
Verified
bfa60767
wip tests
clefourrier
committed
125 days ago
348b7d23
cleanup, some things hanging when being removed from the refacto - no idea why the imports did not fail
clefourrier
committed
125 days ago
19fa6d13
remove useless parameter
clefourrier
committed
125 days ago
ba776ab8
removed loglikelihood single tokens since it was removed everywhere in theory - added docs on caching
clefourrier
committed
125 days ago
3a9d1fbe
we now actively load cached samples after processing the needed items
clefourrier
committed
128 days ago
94e31f18
cache management is working well with no DP for accelerate - need to 1)test with DP 2) add a system where we load cached samples in mem *after* processing the other items
clefourrier
committed
128 days ago
03eb193c
Merge branch 'main' into test_caching
clefourrier
committed
128 days ago
187aaa72
fixed caching system for predictions, and better logging
clefourrier
committed
128 days ago
205ab1e5
added caching to transformers
clefourrier
committed
128 days ago
39c03e7c
Merge branch 'main' into test_caching
clefourrier
committed
128 days ago
76c8b6e8
Fix log system prompt (#907)
clefourrier
committed
128 days ago
Verified
7693a0fd
Moved some files (#905)
clefourrier
committed
129 days ago
Verified
d0cd4c91
Debug log likelihood evals which are broken for accelerate (#901)
clefourrier
committed
130 days ago
Verified
ea1dd184
Number of fixes to run accelerate evaluations (#898)
clefourrier
committed
131 days ago
Verified
64f93b0e
Update transformers and vllm dependencies (#899)
clefourrier
committed
131 days ago
Verified
865335e4
Added post processing (for reasoning tokens) to pipeline (#882)
clefourrier
committed
132 days ago
Verified
d7beacbc
fix path concatenation (#891)
clefourrier
committed
132 days ago
Verified
994e9e07
Fix DP>1 & TP>1 evals with vllm (#841)
NouamaneTazi
committed
135 days ago
Verified
24895519
Adds continuous batching (#850)
NathanHB
committed
135 days ago
Verified
99bfd9f2
Fix typo: CONCURENT_CALLS -> CONCURRENT_CALLS (#884)
muupan
committed
135 days ago
Verified
1404ba1d
Fixed typo in Python API documentation (#862)
dtung8068
committed
135 days ago
Verified
3b1126ad
Automatically infer whether to use a chat template or not instead of using kwargs (#885)
clefourrier
committed
135 days ago
Verified
e7d70902
not working
clefourrier
committed
135 days ago
aaae0ea2
init
clefourrier
committed
136 days ago
e0cd1330
Older