CUDA: optimize MMQ int8 tensor core performance #8062
slaren
commented
on 2024-06-22
slaren
commented
on 2024-06-22
slaren
approved these changes
on 2024-06-24
CUDA: optimize MMQ int8 tensor core performance
db6dae79
only a single get_mma_tile_x_k function
cab59819
simplify code, make functions constexpr
5db21312
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub