llama.cpp
799fc226
- CUDA: Faster Mixtral prompt processing (#4538)
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Commit
View On
GitHub
Commit
1 year ago
CUDA: Faster Mixtral prompt processing (#4538) * CUDA: make MoE tensors contiguous for batch size>1 * Update ggml-cuda.cu Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>
References
#4538 - CUDA: Faster Mixtral prompt processing
Author
JohannesGaessler
Parents
328b83de
Loading