ggml-cuda : add rope f16, restore performance with parallel decoding #3272
ggml-cuda : add rope f16, restore performance
488c1fc7
slaren
commented
on 2023-09-19
slaren
commented
on 2023-09-19
offload KQ_mask with all models
4c0f2437
Merge branch 'custom-attention-mask' into cam-cuda-2
2e92aefe
ggerganov
force pushed
from
13c8c307
to
2e92aefe
1 year ago
fix rope shift
d30ab79b
ggerganov
approved these changes
on 2023-09-20
ggerganov
merged
e04dc519
into custom-attention-mask 1 year ago
slaren
deleted the cam-cuda-2 branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub