PR #4538 CUDA: Faster Mixtral prompt processing

CUDA: Faster Mixtral prompt processing #4538

JohannesGaessler merged 2 commits into ggml-org:master from JohannesGaessler:cuda-mixtral-pp-2

slaren commented on 2023-12-19

JohannesGaessler force pushed from d34bfda3 1 year ago

JohannesGaessler force pushed 1 year ago

JohannesGaessler marked this pull request as ready for review 1 year ago

JohannesGaessler force pushed 1 year ago

slaren commented on 2023-12-20

CUDA: make MoE tensors contiguous for batch size>1

842adecf

JohannesGaessler force pushed to 842adecf 1 year ago

slaren commented on 2023-12-20

Update ggml-cuda.cu

967a0146

slaren approved these changes on 2023-12-20

ggerganov approved these changes on 2023-12-20

JohannesGaessler merged 799fc226 into master 1 year ago

Reviewers

ggerganov

slaren

Assignees

No one assigned

Labels

None yet

Milestone

No milestone