gptq_benchmark_update (#1420)
* add_exllamav2
* style
* fix doc
* simplify script
* style
* update perplexity measure
* Revert "Merge branch 'add_exllamav2' into update-benchmark-gptq"
This reverts commit f2dbdc2ea13183c353dfa22135d2a7f401a3dbbb, reversing
changes made to 216213e46e094de9d72614c09b058dceb1b35020.
* Merge branch 'add_exllamav2' into update-benchmark-gptq
* fix arg in llama attention
* flash_attention arg
* Revert "Merge branch 'add_exllamav2' into update-benchmark-gptq"
This reverts commit 97a7c62b0cf09ad4671a4198958977143a1191cf.
* update benchmark prefill and generate
* replace by use_exllama_v2
* update benchmark arg
* switch to a config_dict instead of disable_exllamav2
* Apply suggestions from code review
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
* better tests
* style
* style
---------
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>