llama.cpp
ggml-cuda : update rope implementation for parallel decoding
#3254
Merged

ggml-cuda : update rope implementation for parallel decoding #3254

ggerganov merged 5 commits into custom-attention-mask from cam-cuda
slaren
slaren ggml-cuda : update rope implementation for parallel decoding
eec6b66a
slaren slaren marked this pull request as draft 2 years ago
slaren
slaren slaren marked this pull request as ready for review 2 years ago
slaren better solution for p0 computation
fb92acdd
slaren slaren force pushed to fb92acdd 2 years ago
slaren
slaren commented on 2023-09-18
slaren slaren requested a review from JohannesGaessler JohannesGaessler 2 years ago
slaren
slaren fix rope
cbe2bac2
KerfuffleV2
slaren simpler rope implementation
aa18b939
ggerganov Merge branch 'custom-attention-mask' into cam-cuda
93352769
ggerganov
ggerganov approved these changes on 2023-09-19
ggerganov ggerganov merged 7e2b9974 into custom-attention-mask 2 years ago
slaren slaren deleted the cam-cuda branch 2 years ago
JohannesGaessler
JohannesGaessler commented on 2023-09-19

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone