CUDA: refactor ggml_cuda_op + lower GPU latency via quantization on main GPU and tiling #3110
ggerganov
approved these changes
on 2023-09-11
slaren
commented
on 2023-09-11
slaren
approved these changes
on 2023-09-11
CUDA: lower GPU latency + fix Windows performance
92687450
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub