llama.cpp
Compute buffer and KV-cache aware layer distribution for multi-GPU inference
#14484

Open

Compute buffer and KV-cache aware layer distribution for multi-GPU inference #14484

borebot wants to merge 2 commits into ggml-org:master from borebot:kv-compute-buffer-cache-aware-allocation

Implement context-length dependent KV-cache and Compute Buffer aware …

49f92711

Merge branch 'master' into kv-compute-buffer-cache-aware-allocation

aa89ddab

borebot requested a review from

CISC 134 days ago

borebot requested a review from

ggerganov 134 days ago

Reviewers

CISC

ggerganov

Assignees

No one assigned

Labels

None yet

Milestone

No milestone