Adding cuda kernel (optimized for sm80) for block-wise 4b quantized float 16 GEMM. #18619
snnn
commented
on 2023-11-30
snnn
commented
on 2023-11-30
adding cuda kernel with tests
99075996
add compilation flag
7ca652c3
require cuda 11.4 for cutlass
93ac7e33
fix comments and rebase on main
cf397577
chenfucn
force pushed
from
9c92e1ac
to
cf397577
2 years ago
refactor blkq4 gemm quant input generation
73679d3f
lint
423aa1fe
chenfucn
force pushed
from
34adf5d5
to
423aa1fe
2 years ago
conflict with main
40de1a14
remove redundent test function
2d67beaa
chenfucn
force pushed
from
efe36430
to
2d67beaa
2 years ago
fix mis-spell and comments
18bf4636
variable and type names
7d5d5ca4
ptx for row blocking no zero-point
b9f9cb76
optimize column block dequant
31a602f4
lint
1477c011
yufenglee
approved these changes
on 2024-03-05
chenfucn
merged
06e684c9
into main 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub