llama.cpp
0c21677e - CUDA: faster FA for GQA > 1 but not power of 2 (#19092)

Commit
27 days ago
CUDA: faster FA for GQA > 1 but not power of 2 (#19092)
Parents
Loading