lighteval
880bebef - Adds inspectai (#1022)

Commit
87 days ago
Adds inspectai (#1022) adds inspect-ai as backend for lighteval! Offloading backend implementation and maintenance - this allows for: - better logs - better paralelixzation - easier to add tasks tasks compatible with inspect ai (at term all the tasks will be compatible): - gpqa (fewshot compatible) - ifeval - hle - gsm8k (fewshot compatible) - agieval - aime24,25 ### run llama3.1-8b using all providers on `hf-inference-providers` on `gpqa`, `agieval` and `aime25`: ``` lighteval eval hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:cerebras \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:featherless-ai \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:fireworks-ai \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:novita \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:nebius \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:sambanova \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:scaleway \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:nscale \ "lighteval|gpqa|0,lighteval|agieval|0,lighteval|aime25|0" \ max-connections 50 --timeout 30 --retry-on-error 1 --max-retries 5 --epochs 1 --max-samples 1 ``` result: ``` | Model |agieval|aime25|gpqa| |----------------------------------------------------------------------|------:|-----:|---:| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:cerebras | 0.53| 0|0.33| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:featherless-ai| 0.71| 1|0.75| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:fireworks-ai | 0.71| 0|0.25| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:nebius | 0.53| 0|0.20| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:novita | 0.65| 0|0.75| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:sambanova | 0.71| 0|0.25| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:scaleway | 0.35| 0|0.25| ``` ### compare few shots diff on gsm8k ``` lighteval eval hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:cerebras \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:featherless-ai \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:fireworks-ai \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:novita \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:nebius \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:sambanova \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:scaleway \ hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:nscale \ "lighteval|gsm8k|0,lighteval|gsm8k|3" \ max-connections 50 --timeout 30 --retry-on-error 1 --max-retries 5 --epochs 1 --max-samples 1 ``` ``` | Model |gsm8k|gsm8k_3_shots| |----------------------------------------------------------------------|----:|------------:| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:cerebras | 0.6| 0.7| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:featherless-ai| 0.7| 0.7| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:fireworks-ai | 0.7| 0.8| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:nebius | 0.6| 0.7| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:novita | 0.5| 0.7| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:sambanova | 0.7| 0.7| |hf-inference-providers/meta-llama/Llama-3.1-8B-Instruct:scaleway | 0.4| 0.8| ``` --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Author
Parents
Loading