Unify benchmark scripts and knowledge #2374
new benchmark script for multiple model families and tasks
dbcb0d6b
expanding scripts to work with rest of tasks
9214ed68
correctness and skip tests we know will fail
1dad6c43
adding plotting script
eecbc49c
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub