ggml-cuda : add rope f16, restore performance with parallel decoding #3272
ggml-cuda : add rope f16, restore performance
488c1fc7
slaren
commented
on 2023-09-19
slaren
commented
on 2023-09-19
offload KQ_mask with all models
4c0f2437
Merge branch 'custom-attention-mask' into cam-cuda-2
2e92aefe
ggerganov
force pushed
to
2e92aefe
2 years ago
fix rope shift
d30ab79b
ggerganov
approved these changes
on 2023-09-20
ggerganov
merged
e04dc519
into custom-attention-mask 2 years ago
slaren
deleted the cam-cuda-2 branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub