llama.cpp
metal: speed up Qwen3-VL image encoding on large images by ~11%
#21443
Open

metal: speed up Qwen3-VL image encoding on large images by ~11% #21443

Avidanborisov
Avidanborisov metal: make flash attention support 16 queries per threadgroup
c0788eb3
Avidanborisov metal: use 16 queries per threadgroup and 8 simdgroups in flash atten…
23bd7621
Avidanborisov Avidanborisov requested a review 4 days ago
github-actions github-actions added ggml
github-actions github-actions added Apple Metal
Avidanborisov Merge branch 'ggml-org:master' into metal-img-encode-optim
7ef7cb05
ggerganov
Avidanborisov

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone