fix(server): llama v2 GPTQ (#648)
As per title & reported
https://github.com/huggingface/text-generation-inference/issues/601#issuecomment-1641435956
https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ/discussions/5
Test it:
```
GPTQ_BITS=4 GPTQ_GROUPSIZE=1 text-generation-launcher --model-id TheBloke/Llama-2-70B-chat-GPTQ --port 8080 --num-shard 4 --quantize gptq
```
&
```
curl 127.0.0.1:8080/generate \
-X POST \
-d '{"inputs":"hey llama","parameters":{"max_new_tokens":256}}' \
-H 'Content-Type: application/json'
```