Block-wise 4b quantization matmul operator change #18172
MatMulNBits cpu and cuda changes
1eca8610
python change
34e266da
add quantization fp16 instantiation
04498ceb
fix python test
c628dbd1
chenfucn
changed the title Cfu blkq4 Block-wise 4b quantization matmul operator change 2 years ago
lint
80eab050
lintrunner
3fb8ea0f
Update onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc
bd1e0343
Apply suggestions from code review
1bcba510
replace dequant
da19fa99
dequant adjustment
c4148e37
optimize dequant
53e703ee
dequant tail adjustment
3a565c8e
split dequant impl
b05046e1
lint
35978ced
Add quant bits to template parameter
9362e456
lint
8fb89932
yufenglee
approved these changes
on 2023-11-03
edgchen1
approved these changes
on 2023-11-03
chenfucn
merged
26b39641
into main 2 years ago
chenfucn
deleted the cfu_blkq4 branch 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub