llama.cpp
ggml-cuda : add rope f16, restore performance with parallel decoding
#3272
Merged

ggml-cuda : add rope f16, restore performance with parallel decoding #3272

ggerganov merged 4 commits into custom-attention-mask from cam-cuda-2
slaren
slaren ggml-cuda : add rope f16, restore performance
488c1fc7
slaren
slaren commented on 2023-09-19
slaren
slaren commented on 2023-09-19
slaren offload KQ_mask with all models
4c0f2437
ggerganov Merge branch 'custom-attention-mask' into cam-cuda-2
2e92aefe
ggerganov ggerganov force pushed from 13c8c307 to 2e92aefe 1 year ago
ggerganov
slaren fix rope shift
d30ab79b
slaren
ggerganov
ggerganov approved these changes on 2023-09-20
ggerganov ggerganov merged e04dc519 into custom-attention-mask 1 year ago
slaren slaren deleted the cam-cuda-2 branch 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone