Add IFBench (#944)
* init, wip
* unrelated but these tasks were buggy
* better suite management: we don't load all optional deps all the time
* upgrade
* singleton + transformer sampling fix in config
* incredible how much code was just pulled from ifeval
* fix test 1
* fix test 2
* fix tests part 1 - also removes fewshot truncation in the task name because it's no longer used anywhere in the code logically
* fix registry mockup
* fixed last tests