llama.cpp
cuda: optimize SOLVE_TRI using registers and FMAF
#17703
Merged

Loading