Add custom tasks for evaluation of french models (#505)
* Add tasks for benchmark of french models
* Remove duplicated code, metric imported from ifeval main file
* Remove 'loglikelihood single token' for running GPQA with vllm
* Change subset for gpqa-fr task