llama.cpp
116efee0 - cuda: add q8_0->f32 cpy operation (#9571)

Commit
1 year ago
cuda: add q8_0->f32 cpy operation (#9571) llama: enable K-shift for quantized KV cache It will fail on unsupported backends or quant types.
Author
Parents
Loading