onnxruntime
[MLAS] add q4 quantize and transpose kernel to support MatMulNBits QDQ fuse
#21054

Merged

[MLAS] add q4 quantize and transpose kernel to support MatMulNBits QDQ fuse #21054

fajin-corp merged 25 commits into main from fajin/qdqmatmulnbitskkernels

fajin-corp requested a review 1 year ago

fajin-corp force pushed from 14fcdb77 to ed9421a8 1 year ago

yufenglee commented on 2024-06-18

github-advanced-security commented on 2024-06-19

yufenglee dismissed these changes on 2024-06-19

adding qdq quantize for matmulnbits

719f8a77

added quantizeColumnWise

17202c9f

updating

5cfd04f1

added aligning limit and updated quantizeColumnWise

142fa90d

finished transpose

69a3391e

added headers and py binding

95aae782

fix build error

e3a850c3

adding unaligned code

531f20b7

limit to 4 bits, and separate out pack aligned and unaligned

5bc92eac

refactored QuantizeColumnWisePackAligned

2d17f5cd

finished quantize pack unaligned

6d8c90ce

updated TransposeColumnWiseQuantizedPackAligned

8d8244c6

finished TransposeColumnWiseQuantizedPackUnaligned

6fa950de

fixed one opNotLastAxis

b9904a16

fixed bloked Q 4bit multithread bug

31852ab0

fix build

080e41e1

fix build

181723d7

pass ut

1138dd61

finished benchmarking

9acdede3

added odd N to benchmark

87ba147a

fix ci build

d1dac08d

update bechmark to fix linux build

ab48c1eb

fix ci lint error

e0ba0695

resolve comments

c3f3b5e9

fix ci warning

44c0115e

fajin-corp dismissed their stale review via 44c0115e 1 year ago

fajin-corp force pushed from a4fe4c50 to 44c0115e 1 year ago

yufenglee approved these changes on 2024-06-19

fajin-corp merged 6817b013 into main 1 year ago

fajin-corp deleted the fajin/qdqmatmulnbitskkernels branch 1 year ago

Reviewers

yufenglee

github-advanced-security

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

onnxruntime [MLAS] add q4 quantize and transpose kernel to support MatMulNBits QDQ fuse #21054 Merged

[MLAS] add q4 quantize and transpose kernel to support MatMulNBits QDQ fuse #21054

onnxruntime
[MLAS] add q4 quantize and transpose kernel to support MatMulNBits QDQ fuse
#21054

Merged