llama.cpp
llama : per-layer KV cache
#4309
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
15
Changes
View On
GitHub
llama : per-layer KV cache
#4309
ggerganov
merged 15 commits into
master
from
gg/per-layer-kv
per-layer KV
e9bcf66a
remove unnecessary copies
55f2f2fb
less code duplication, offload k and v separately
f4f9367f
Merge branch 'master' into per-layer-kv
c294c78e
llama : offload KV cache per-layer
986b3da7
llama : offload K shift tensors
f3dbfb9f
llama : offload for rest of the model arches
3d3e6bd0
ggerganov
added
performance
ggerganov
added
need feedback
slaren
commented on 2023-12-03
ggerganov
commented on 2023-12-03
llama : enable offload debug temporarily
1fa91a48
llama : keep the KV related layers on the device
c44bc1ee
llama : remove mirrors, perform Device -> Host when partial offload
c80b8a2b
common : add command-line arg to disable KV cache offloading
e262947d
llama : update session save/load
66aaac98
llama : support quantum K cache (#4312)
1a1a1c38
ggerganov
added
breaking change
Merge branch 'master' into gg/per-layer-kv
680a99e7
readme : add API change notice
fc5f3346
ggerganov
merged
bcc0eb45
into master
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
kalomaze
Assignees
No one assigned
Labels
performance
breaking change
need feedback
Milestone
No milestone
Login to write a write a comment.
Login via GitHub