llama.cpp
5ec717d1 - ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040)

Commit

48 days ago

ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040) * ggml-webgpu: makes the flash attn vec path compile and size its split/reduce work from the device’s reported subgroup range instead of assuming 32 subgroup size. * ggml-webgpu: remove the extra max_wg_size >= max_subgroup_size guard. Remove hardcoded 32 when determine the value of reduce_wg_size and vec_nwg_cap

References

#23040 - ggml-webgpu: makes the flash attn vec path subgroup-aware

Author

ArberSephirotheca

Parents

0c3e4fcc

llama.cpp 5ec717d1 - ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040)

llama.cpp
5ec717d1 - ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040)