llama.cpp
metal : improve FA + improve MoE
#12612
Merged

metal : improve FA + improve MoE #12612

ggerganov merged 10 commits into master from gg/metal-fa-diff-heads
ggerganov
ggerganov ggml : FA with different K, V head sizes (CPU)
eb5a8276
ggerganov metal : add FA with HS=192
b3dbc325
ggerganov metal : extend FA to support different K and V head sizes
fa5f91da
ggerganov metal : add FA vector kernels for heads K 192 and V 128
ac33a922
github-actions github-actions added testing
github-actions github-actions added ggml
github-actions github-actions added Apple Metal
github-actions github-actions added Nvidia GPU
github-actions github-actions added Vulkan
ggerganov ggml : restrict op on other backends to equal head sizes
1e0f5ad7
ggerganov ggerganov force pushed to 1e0f5ad7 1 year ago
ggerganov metal : optimize FA-vec kernel
a444d39b
ggerganov metal : FA remove mq registers
e1e56f72
ggerganov metal : improve MoE mul_mat_id condition
9c2b7835
ggerganov ggerganov changed the title metal : add FA kernels for different K, V head sizes metal : improve FA + improve MoE 1 year ago
ggerganov metal : fix comments + remove unnecessary addition
94af548e
ggerganov metal : avoid too much shared memory usage with mul_mat_id
6e035068
ggerganov ggerganov merged b4ae5081 into master 1 year ago
ggerganov ggerganov deleted the gg/metal-fa-diff-heads branch 1 year ago
PkmX
ggerganov
PkmX
ggerganov
PkmX
ggerganov
PkmX
ggerganov

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone