llama.cpp
llama : support quantum K cache
#4312
Merged

llama : support quantum K cache #4312

ggerganov merged 14 commits into gg/per-layer-kv from gg/quantum-k-cache
ggerganov
ggerganov llama : support quantum K cache (wip)
d04ee928
cmp-nct
Green-Sky
ggerganov
BarfingLemurs
ggerganov metal : add F32 -> Q8_0 copy kernel
bcfebf24
ggerganov cuda : add F32 -> Q8_0 copy kernel
a1bf6c09
ggerganov cuda : use mmv kernel for quantum cache ops
b881f630
JohannesGaessler
ggerganov
ggerganov llama : pass KV cache type through API
3ce30e07
ggerganov llama : fix build
7864a2cd
ggerganov metal : add F32 -> Q4_0 copy kernel
9d69ecc0
ggerganov metal : add F32 -> Q4_1 copy kernel
6b58ae98
ggerganov cuda : wip
e8457c90
ggerganov cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels
b2acedeb
ggerganov ggerganov marked this pull request as ready for review 2 years ago
ggerganov
ggerganov ggerganov added need feedback
slaren llama-bench : support type_k/type_v
903167a7
slaren
slaren
ggerganov metal : use mm kernel only for quantum KV cache
dd86df82
ggerganov cuda : add comment
4adb1d69
ggerganov llama : remove memory_f16 and kv_f16 flags
af99c6fb
ggerganov
slaren
ggerganov
Dampfinchen
askmyteapot
askmyteapot
ggerganov ggerganov merged 1a1a1c38 into gg/per-layer-kv 2 years ago
ggerganov
cebtenzzre
ggerganov
xingchensong
JohannesGaessler
DesperateZero

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone