llama.cpp
CUDA: Faster Mixtral prompt processing
#4538
Merged

CUDA: Faster Mixtral prompt processing #4538

JohannesGaessler
slaren
slaren commented on 2023-12-19
slaren
ggerganov
JohannesGaessler JohannesGaessler force pushed from d34bfda3 1 year ago
JohannesGaessler JohannesGaessler force pushed 1 year ago
JohannesGaessler
JohannesGaessler JohannesGaessler force pushed 1 year ago
JohannesGaessler
JohannesGaessler JohannesGaessler marked this pull request as ready for review 1 year ago
slaren
JohannesGaessler
slaren
JohannesGaessler JohannesGaessler force pushed 1 year ago
JohannesGaessler
slaren
slaren commented on 2023-12-20
slaren
JohannesGaessler CUDA: make MoE tensors contiguous for batch size>1
842adecf
JohannesGaessler JohannesGaessler force pushed to 842adecf 1 year ago
slaren
slaren commented on 2023-12-20
JohannesGaessler Update ggml-cuda.cu
967a0146
slaren
slaren approved these changes on 2023-12-20
ggerganov
ggerganov approved these changes on 2023-12-20
JohannesGaessler JohannesGaessler merged 799fc226 into master 1 year ago
ggerganov

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone