llama.cpp
examples : add llama-eval
#21152
Open

examples : add llama-eval #21152

ggerganov wants to merge 51 commits into master from gg/scripts-eval
ggerganov
gatbontonpc working llama-eval mc and math suite
c05df17c
gatbontonpc multi source llama-eval
c2d83ca0
gatbontonpc Add readme
89cab3db
gatbontonpc add checkpointing
88390375
ggerganov examples: add llama-server simulator for testing eval scripts
07d5e1e0
ggerganov examples: refactor test-simulator.sh for better readability
23d4e21a
ggerganov docs: update llama-eval-discussion.md with session work summary
c87af1d5
ggerganov examples: add simplified llama-eval-new.py for AIME evaluation
5cc2258e
ggerganov docs: remove README.md from llama-eval
a80814e9
ggerganov examples: implement flexible grader system for answer validation
5a1be6ce
ggerganov examples: use HF_HUB_OFFLINE to avoid HF Hub warnings
9453f9de
ggerganov examples: remove HF_HUB_OFFLINE to allow dataset download
87f89309
ggerganov examples: use cached dataset path to avoid HF Hub requests
c2619c18
ggerganov examples: use cached dataset path in simulator to avoid HF Hub requests
04f68721
ggerganov docs: update llama-eval-discussion.md with session work summary
37b26caf
ggerganov examples: add threading support and model parameter to llama-eval-new.py
62b04cef
ggerganov docs: update llama-eval-discussion.md with threading and model parame…
a939f4c4
ggerganov examples: add task summary table to llama-eval-new.py
e79e8d02
ggerganov eval : print progress
812ae13e
ggerganov eval : add prompts
fb1481d6
ggerganov test : fix path
9695e6fe
ggerganov sim : fix answer matching
8156d549
ggerganov eval : support multiple dataset runs
fd90796d
ggerganov minor
68dde884
ggerganov improve grader
d2b10302
ggerganov docs
7751ae27
ggerganov remove old files
1db8428f
ggerganov datasets : add gsm8k
e8a80751
ggerganov add gpqa + sampling + docs
cffd268b
ggerganov rename
73e61d5b
ggerganov grader : improve example answers
f762a71d
ggerganov cont
c6315655
ggerganov datasets : add aime2025
99e3c3d0
ggerganov grader : update prompt
52759bf0
ggerganov grade : improve regex + logs
db10dda1
ggerganov datasets : fix aime2025
350e7c14
ggerganov cleanup
de956a6c
ggerganov add AGENTS.md
c6d70b9b
ggerganov ignore errors
ad3a54eb
ggerganov resume eval
e6e777cf
ggerganov cleanup
60a501e1
ggerganov fix counts
7b84af80
ggerganov simplify
6c41664b
ggerganov fix prompts
e2e998a2
ggerganov add html
013963cf
ggerganov store full response
9c29be11
ggerganov add tokens
2ffa45ed
ggerganov resoning and error handling
7f049860
ggerganov refactor
c0c3e428
ggerganov track total time
a3405d42
ggerganov remove junk
1c128d94
github-actions github-actions added examples
github-actions github-actions added python
strawberrymelonpanda
ggerganov
strawberrymelonpanda
strawberrymelonpanda

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone