Compute buffer and KV-cache aware layer distribution for multi-GPU inference #14484
Implement context-length dependent KV-cache and Compute Buffer aware …
49f92711
Merge branch 'master' into kv-compute-buffer-cache-aware-allocation
aa89ddab
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub