ggml-cuda : update rope implementation for parallel decoding #3254
ggml-cuda : update rope implementation for parallel decoding
eec6b66a
slaren
marked this pull request as draft 2 years ago
slaren
marked this pull request as ready for review 2 years ago
better solution for p0 computation
fb92acdd
slaren
force pushed
to
fb92acdd
2 years ago
slaren
commented
on 2023-09-18
fix rope
cbe2bac2
simpler rope implementation
aa18b939
Merge branch 'custom-attention-mask' into cam-cuda
93352769
ggerganov
approved these changes
on 2023-09-19
ggerganov
merged
7e2b9974
into custom-attention-mask 2 years ago
slaren
deleted the cam-cuda branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub