onnxruntime
Adding cuda kernel (optimized for sm80) for block-wise 4b quantized float 16 GEMM.
#18619

Merged

Adding cuda kernel (optimized for sm80) for block-wise 4b quantized float 16 GEMM. #18619

chenfucn merged 13 commits into microsoft:main from chenfucn:cfu_kernel

snnn commented on 2023-11-30

yufenglee commented on 2024-01-08

yufenglee commented on 2024-01-09

adding cuda kernel with tests

99075996

add compilation flag

7ca652c3

require cuda 11.4 for cutlass

93ac7e33

fix comments and rebase on main

cf397577

chenfucn force pushed from 9c92e1ac to cf397577 2 years ago

github-advanced-security commented on 2024-01-26

refactor blkq4 gemm quant input generation

73679d3f

github-advanced-security commented on 2024-01-30

lint

423aa1fe

chenfucn force pushed from 34adf5d5 to 423aa1fe 2 years ago

conflict with main

40de1a14

remove redundent test function

2d67beaa

chenfucn force pushed from efe36430 to 2d67beaa 2 years ago

yufenglee commented on 2024-02-08

yufenglee commented on 2024-02-12

fix mis-spell and comments

18bf4636

yufenglee commented on 2024-02-15

yufenglee commented on 2024-02-16

yufenglee commented on 2024-02-20

variable and type names

7d5d5ca4

ptx for row blocking no zero-point

b9f9cb76

optimize column block dequant

31a602f4

lint

1477c011

yufenglee approved these changes on 2024-03-05

chenfucn merged 06e684c9 into main 2 years ago

Reviewers

yufenglee

github-advanced-security

snnn

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

onnxruntime Adding cuda kernel (optimized for sm80) for block-wise 4b quantized float 16 GEMM. #18619 Merged

Adding cuda kernel (optimized for sm80) for block-wise 4b quantized float 16 GEMM. #18619

onnxruntime
Adding cuda kernel (optimized for sm80) for block-wise 4b quantized float 16 GEMM.
#18619

Merged