llama : support quantum K cache #4312
llama : support quantum K cache (wip)
d04ee928
metal : add F32 -> Q8_0 copy kernel
bcfebf24
cuda : add F32 -> Q8_0 copy kernel
a1bf6c09
cuda : use mmv kernel for quantum cache ops
b881f630
llama : pass KV cache type through API
3ce30e07
llama : fix build
7864a2cd
metal : add F32 -> Q4_0 copy kernel
9d69ecc0
metal : add F32 -> Q4_1 copy kernel
6b58ae98
cuda : wip
e8457c90
cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels
b2acedeb
ggerganov
marked this pull request as ready for review 2 years ago
llama-bench : support type_k/type_v
903167a7
metal : use mm kernel only for quantum KV cache
dd86df82
cuda : add comment
4adb1d69
llama : remove memory_f16 and kv_f16 flags
af99c6fb
ggerganov
merged
1a1a1c38
into gg/per-layer-kv 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub