llama.cpp
116efee0
- cuda: add q8_0->f32 cpy operation (#9571)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
cuda: add q8_0->f32 cpy operation (#9571) llama: enable K-shift for quantized KV cache It will fail on unsupported backends or quant types.
References
#9571 - CUDA: Enable K-shift operation for -ctk q8_0 (limited)
Author
Nekotekina
Parents
0b3bf966
Loading