examples : add llama-eval #21152
working llama-eval mc and math suite
c05df17c
multi source llama-eval
c2d83ca0
Add readme
89cab3db
add checkpointing
88390375
examples: add llama-server simulator for testing eval scripts
07d5e1e0
examples: refactor test-simulator.sh for better readability
23d4e21a
docs: update llama-eval-discussion.md with session work summary
c87af1d5
examples: add simplified llama-eval-new.py for AIME evaluation
5cc2258e
docs: remove README.md from llama-eval
a80814e9
examples: implement flexible grader system for answer validation
5a1be6ce
examples: use HF_HUB_OFFLINE to avoid HF Hub warnings
9453f9de
examples: remove HF_HUB_OFFLINE to allow dataset download
87f89309
examples: use cached dataset path to avoid HF Hub requests
c2619c18
examples: use cached dataset path in simulator to avoid HF Hub requests
04f68721
docs: update llama-eval-discussion.md with session work summary
37b26caf
examples: add threading support and model parameter to llama-eval-new.py
62b04cef
docs: update llama-eval-discussion.md with threading and model parameā¦
a939f4c4
examples: add task summary table to llama-eval-new.py
e79e8d02
eval : print progress
812ae13e
eval : add prompts
fb1481d6
test : fix path
9695e6fe
sim : fix answer matching
8156d549
eval : support multiple dataset runs
fd90796d
minor
68dde884
improve grader
d2b10302
docs
7751ae27
remove old files
1db8428f
datasets : add gsm8k
e8a80751
add gpqa + sampling + docs
cffd268b
rename
73e61d5b
grader : improve example answers
f762a71d
cont
c6315655
datasets : add aime2025
99e3c3d0
grader : update prompt
52759bf0
grade : improve regex + logs
db10dda1
datasets : fix aime2025
350e7c14
cleanup
de956a6c
add AGENTS.md
c6d70b9b
ignore errors
ad3a54eb
resume eval
e6e777cf
cleanup
60a501e1
fix counts
7b84af80
simplify
6c41664b
fix prompts
e2e998a2
add html
013963cf
store full response
9c29be11
add tokens
2ffa45ed
resoning and error handling
7f049860
refactor
c0c3e428
track total time
a3405d42
remove junk
1c128d94
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub