llama.cpp
ggml-cuda : update rope implementation for parallel decoding
#3254

Merged

ggml-cuda : update rope implementation for parallel decoding #3254

ggerganov merged 5 commits into custom-attention-mask from cam-cuda

ggml-cuda : update rope implementation for parallel decoding

eec6b66a

slaren marked this pull request as draft 2 years ago

slaren marked this pull request as ready for review 2 years ago

better solution for p0 computation

fb92acdd

slaren force pushed to fb92acdd 2 years ago

slaren commented on 2023-09-18

slaren requested a review from

JohannesGaessler 2 years ago

fix rope

cbe2bac2

simpler rope implementation

aa18b939

Merge branch 'custom-attention-mask' into cam-cuda

93352769

ggerganov approved these changes on 2023-09-19

ggerganov merged 7e2b9974 into custom-attention-mask 2 years ago

slaren deleted the cam-cuda branch 2 years ago

JohannesGaessler commented on 2023-09-19

Reviewers

ggerganov

JohannesGaessler

Assignees

No one assigned

Labels

None yet

Milestone

No milestone