llama.cpp
ggml-cuda : add rope f16, restore performance with parallel decoding
#3272

Merged

ggml-cuda : add rope f16, restore performance with parallel decoding #3272

ggerganov merged 4 commits into custom-attention-mask from cam-cuda-2

ggml-cuda : add rope f16, restore performance

488c1fc7

slaren commented on 2023-09-19

offload KQ_mask with all models

4c0f2437

Merge branch 'custom-attention-mask' into cam-cuda-2

2e92aefe

ggerganov force pushed from 13c8c307 to 2e92aefe 1 year ago

fix rope shift

d30ab79b

ggerganov approved these changes on 2023-09-20

ggerganov merged e04dc519 into custom-attention-mask 1 year ago

slaren deleted the cam-cuda-2 branch 1 year ago

Reviewers

ggerganov

Assignees

No one assigned

Labels

None yet

Milestone

No milestone