PR #4312 llama : support quantum K cache

llama : support quantum K cache #4312

ggerganov merged 14 commits into gg/per-layer-kv from gg/quantum-k-cache

llama : support quantum K cache (wip)

d04ee928

metal : add F32 -> Q8_0 copy kernel

bcfebf24

cuda : add F32 -> Q8_0 copy kernel

a1bf6c09

cuda : use mmv kernel for quantum cache ops

b881f630

llama : pass KV cache type through API

3ce30e07

llama : fix build

7864a2cd

metal : add F32 -> Q4_0 copy kernel

9d69ecc0

metal : add F32 -> Q4_1 copy kernel

6b58ae98

cuda : wip

e8457c90

cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels

b2acedeb

ggerganov marked this pull request as ready for review 2 years ago

ggerganov added need feedback

llama-bench : support type_k/type_v

903167a7

metal : use mm kernel only for quantum KV cache

dd86df82

cuda : add comment

4adb1d69

llama : remove memory_f16 and kv_f16 flags

af99c6fb

ggerganov merged 1a1a1c38 into gg/per-layer-kv 2 years ago

Reviewers

No reviews

Assignees

No one assigned

Labels

need feedback

Milestone

No milestone