Commit
91 days ago
Add IFBench (#944) * init, wip * unrelated but these tasks were buggy * better suite management: we don't load all optional deps all the time * upgrade * singleton + transformer sampling fix in config * incredible how much code was just pulled from ifeval * fix test 1 * fix test 2 * fix tests part 1 - also removes fewshot truncation in the task name because it's no longer used anywhere in the code logically * fix registry mockup * fixed last tests
Author
Parents
Loading