llama.cpp
llama : per-layer KV cache
#4309
Merged

llama : per-layer KV cache #4309

ggerganov merged 15 commits into master from gg/per-layer-kv
ggerganov
slaren per-layer KV
e9bcf66a
slaren remove unnecessary copies
55f2f2fb
slaren less code duplication, offload k and v separately
f4f9367f
ggerganov Merge branch 'master' into per-layer-kv
c294c78e
ggerganov llama : offload KV cache per-layer
986b3da7
ggerganov llama : offload K shift tensors
f3dbfb9f
ggerganov llama : offload for rest of the model arches
3d3e6bd0
ggerganov ggerganov added performance
ggerganov ggerganov added need feedback
slaren
slaren commented on 2023-12-03
ggerganov
ggerganov commented on 2023-12-03
ggerganov llama : enable offload debug temporarily
1fa91a48
ggerganov llama : keep the KV related layers on the device
c44bc1ee
ggerganov llama : remove mirrors, perform Device -> Host when partial offload
c80b8a2b
oobabooga
kalomaze
ggerganov common : add command-line arg to disable KV cache offloading
e262947d
ggerganov llama : update session save/load
66aaac98
kalomaze
ggerganov llama : support quantum K cache (#4312)
1a1a1c38
ggerganov ggerganov added breaking change
ggerganov
kalomaze
kalomaze
ggerganov
ggerganov Merge branch 'master' into gg/per-layer-kv
680a99e7
ggerganov readme : add API change notice
fc5f3346
ggerganov ggerganov merged bcc0eb45 into master 1 year ago
Dampfinchen
x4080
CyborgArmy83
LostRuins
Ph0rk0z
Dampfinchen

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone