llama.cpp
Compute buffer and KV-cache aware layer distribution for multi-GPU inference
#14484
Open

Compute buffer and KV-cache aware layer distribution for multi-GPU inference #14484

borebot
borebot Implement context-length dependent KV-cache and Compute Buffer aware …
49f92711
steampunque
JohannesGaessler
steampunque
borebot
jacekpoplawski
borebot Merge branch 'master' into kv-compute-buffer-cache-aware-allocation
aa89ddab
borebot borebot requested a review from CISC CISC 61 days ago
borebot borebot requested a review from ggerganov ggerganov 61 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone