llama.cpp
CUDA: faster Mixtral prompt processing for partial offloading
#4553
Merged

CUDA: faster Mixtral prompt processing for partial offloading #4553

JohannesGaessler
JohannesGaessler JohannesGaessler force pushed 1 year ago
Dampfinchen
Igoorx
JohannesGaessler
JohannesGaessler
Igoorx
JohannesGaessler
JohannesGaessler JohannesGaessler force pushed to 751687cc 1 year ago
JohannesGaessler
Igoorx
JohannesGaessler
Igoorx
Igoorx
oobabooga
Dampfinchen
slaren
slaren commented on 2023-12-21
JohannesGaessler CUDA: mul_mat_id always on GPU for batches >= 32
fcd0c2ca
JohannesGaessler JohannesGaessler force pushed from 751687cc to fcd0c2ca 1 year ago
slaren
slaren approved these changes on 2023-12-21
JohannesGaessler JohannesGaessler merged 91544948 into master 1 year ago
Dampfinchen

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone