llama.cpp
metal : improve FA + improve MoE
#12612
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
10
Changes
View On
GitHub
metal : improve FA + improve MoE
#12612
ggerganov
merged 10 commits into
master
from
gg/metal-fa-diff-heads
ggml : FA with different K, V head sizes (CPU)
eb5a8276
metal : add FA with HS=192
b3dbc325
metal : extend FA to support different K and V head sizes
fa5f91da
metal : add FA vector kernels for heads K 192 and V 128
ac33a922
github-actions
added
testing
github-actions
added
ggml
github-actions
added
Apple Metal
github-actions
added
Nvidia GPU
github-actions
added
Vulkan
ggml : restrict op on other backends to equal head sizes
1e0f5ad7
ggerganov
force pushed
to
1e0f5ad7
1 year ago
metal : optimize FA-vec kernel
a444d39b
metal : FA remove mq registers
e1e56f72
metal : improve MoE mul_mat_id condition
9c2b7835
ggerganov
changed the title
metal : add FA kernels for different K, V head sizes
metal : improve FA + improve MoE
1 year ago
metal : fix comments + remove unnecessary addition
94af548e
metal : avoid too much shared memory usage with mul_mat_id
6e035068
ggerganov
merged
b4ae5081
into master
1 year ago
ggerganov
deleted the gg/metal-fa-diff-heads branch
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
testing
Nvidia GPU
Vulkan
ggml
Apple Metal
Milestone
No milestone
Login to write a write a comment.
Login via GitHub