llama.cpp
116efee0 - cuda: add q8_0->f32 cpy operation (#9571)

Commit

1 year ago

cuda: add q8_0->f32 cpy operation (#9571) llama: enable K-shift for quantized KV cache It will fail on unsupported backends or quant types.

References

Author

Nekotekina

Parents