llama.cpp
CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8
#7681
Merged

Loading