metal: speed up Qwen3-VL image encoding on large images by ~11% #21443
metal: make flash attention support 16 queries per threadgroup
c0788eb3
metal: use 16 queries per threadgroup and 8 simdgroups in flash atten…
23bd7621
Merge branch 'ggml-org:master' into metal-img-encode-optim
7ef7cb05
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub