llama.cpp
9a590c82 - CUDA: optimize MMQ int8 tensor core performance (#8062)

Commit

1 year ago

CUDA: optimize MMQ int8 tensor core performance (#8062) * CUDA: optimize MMQ int8 tensor core performance * only a single get_mma_tile_x_k function * simplify code, make functions constexpr

References

#8062 - CUDA: optimize MMQ int8 tensor core performance

Author

JohannesGaessler

Parents

52fc8705

llama.cpp 9a590c82 - CUDA: optimize MMQ int8 tensor core performance (#8062)

llama.cpp
9a590c82 - CUDA: optimize MMQ int8 tensor core performance (#8062)