llama.cpp
9a590c82 - CUDA: optimize MMQ int8 tensor core performance (#8062)

Commit
1 year ago
CUDA: optimize MMQ int8 tensor core performance (#8062) * CUDA: optimize MMQ int8 tensor core performance * only a single get_mma_tile_x_k function * simplify code, make functions constexpr
Parents
Loading