llama.cpp
CUDA: refactor ggml_cuda_op + lower GPU latency via quantization on main GPU and tiling
#3110
Merged

CUDA: refactor ggml_cuda_op + lower GPU latency via quantization on main GPU and tiling #3110

JohannesGaessler
ggerganov
ggerganov approved these changes on 2023-09-11
ggerganov ggerganov requested a review from slaren slaren 2 years ago
slaren
JohannesGaessler JohannesGaessler force pushed from c42b3038 to 54f041b6 2 years ago
JohannesGaessler
slaren
JohannesGaessler
slaren
JohannesGaessler
slaren
JohannesGaessler
slaren
JohannesGaessler
slaren
JohannesGaessler
slaren
slaren
slaren
JohannesGaessler
slaren
JohannesGaessler
slaren
slaren
JohannesGaessler JohannesGaessler force pushed from 866b502e to c923de70 2 years ago
JohannesGaessler
slaren
slaren commented on 2023-09-11
slaren
JohannesGaessler
JohannesGaessler JohannesGaessler force pushed from c923de70 to a599006a 2 years ago
JohannesGaessler
slaren
slaren
slaren
slaren approved these changes on 2023-09-11
JohannesGaessler CUDA: lower GPU latency + fix Windows performance
92687450
JohannesGaessler JohannesGaessler force pushed from a599006a to 92687450 2 years ago
slaren
JohannesGaessler
JohannesGaessler JohannesGaessler merged d54a4027 into master 2 years ago
slaren
Dampfinchen
slaren
cebtenzzre

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone