onnxruntime
Block-wise 4b quantization matmul operator change
#18172
Merged

Block-wise 4b quantization matmul operator change #18172

chenfucn merged 16 commits into microsoft:main from chenfucn:cfu_blkq4
chenfucn
chenfucn MatMulNBits cpu and cuda changes
1eca8610
chenfucn python change
34e266da
chenfucn add quantization fp16 instantiation
04498ceb
chenfucn chenfucn requested a review 2 years ago
chenfucn fix python test
c628dbd1
chenfucn chenfucn changed the title Cfu blkq4 Block-wise 4b quantization matmul operator change 2 years ago
chenfucn lint
80eab050
chenfucn lintrunner
3fb8ea0f
edgchen1
edgchen1 commented on 2023-10-31
yihonglyu
yihonglyu commented on 2023-10-31
yihonglyu
yihonglyu commented on 2023-10-31
chenfucn Update onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc
bd1e0343
chenfucn Apply suggestions from code review
1bcba510
chenfucn replace dequant
da19fa99
yufenglee
yufenglee commented on 2023-10-31
yufenglee
yufenglee commented on 2023-10-31
chenfucn dequant adjustment
c4148e37
chenfucn optimize dequant
53e703ee
chenfucn dequant tail adjustment
3a565c8e
chenfucn split dequant impl
b05046e1
chenfucn lint
35978ced
chenfucn Add quant bits to template parameter
9362e456
chenfucn lint
8fb89932
yufenglee
yufenglee approved these changes on 2023-11-03
edgchen1
edgchen1 approved these changes on 2023-11-03
chenfucn chenfucn merged 26b39641 into main 2 years ago
chenfucn chenfucn deleted the cfu_blkq4 branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone