PR #18172 Block-wise 4b quantization matmul operator change

Block-wise 4b quantization matmul operator change #18172

chenfucn merged 16 commits into microsoft:main from chenfucn:cfu_blkq4

MatMulNBits cpu and cuda changes

1eca8610

python change

34e266da

add quantization fp16 instantiation

04498ceb

chenfucn requested a review 2 years ago

fix python test

c628dbd1

chenfucn changed the title ~~Cfu blkq4~~ Block-wise 4b quantization matmul operator change 2 years ago

lint

80eab050

lintrunner

3fb8ea0f

edgchen1 commented on 2023-10-31

yihonglyu commented on 2023-10-31

Update onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc

bd1e0343

Apply suggestions from code review

1bcba510

replace dequant

da19fa99

yufenglee commented on 2023-10-31

dequant adjustment

c4148e37

optimize dequant

53e703ee

dequant tail adjustment

3a565c8e

split dequant impl

b05046e1

lint

35978ced

Add quant bits to template parameter

9362e456

lint

8fb89932

yufenglee approved these changes on 2023-11-03

edgchen1 approved these changes on 2023-11-03

chenfucn merged 26b39641 into main 2 years ago

chenfucn deleted the cfu_blkq4 branch 2 years ago

Reviewers

edgchen1

yufenglee

yihonglyu

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

onnxruntime Block-wise 4b quantization matmul operator change #18172 Merged

Block-wise 4b quantization matmul operator change #18172

onnxruntime
Block-wise 4b quantization matmul operator change
#18172

Merged