llama.cpp
CUDA: faster Mixtral prompt processing for partial offloading
#4553
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
1
Changes
View On
GitHub
CUDA: faster Mixtral prompt processing for partial offloading
#4553
JohannesGaessler
merged 1 commit into
ggml-org:master
from
JohannesGaessler:cuda-mixtral-partial-pp
JohannesGaessler
force pushed
1 year ago
JohannesGaessler
force pushed
to
751687cc
1 year ago
slaren
commented on 2023-12-21
CUDA: mul_mat_id always on GPU for batches >= 32
fcd0c2ca
JohannesGaessler
force pushed
from
751687cc
to
fcd0c2ca
1 year ago
slaren
approved these changes on 2023-12-21
JohannesGaessler
merged
91544948
into master
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
slaren
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub