llama.cpp
d50f8897 - CUDA: stream-k decomposition for MMQ (#8018)

Commit
1 year ago
CUDA: stream-k decomposition for MMQ (#8018) * CUDA: stream-k decomposition for MMQ * fix undefined memory reads for small matrices
Parents
Loading