CUDA: Faster Mixtral prompt processing #4538
slaren
commented
on 2023-12-19
slaren
commented
on 2023-12-20
CUDA: make MoE tensors contiguous for batch size>1
842adecf
slaren
commented
on 2023-12-20
Update ggml-cuda.cu
967a0146
slaren
approved these changes
on 2023-12-20
ggerganov
approved these changes
on 2023-12-20
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub