[MLAS] add q4 quantize and transpose kernel to support MatMulNBits QDQ fuse #21054
fajin-corp
force pushed
from
14fcdb77
to
ed9421a8
1 year ago
yufenglee
dismissed these changes
on 2024-06-19
adding qdq quantize for matmulnbits
719f8a77
added quantizeColumnWise
17202c9f
updating
5cfd04f1
added aligning limit and updated quantizeColumnWise
142fa90d
finished transpose
69a3391e
added headers and py binding
95aae782
fix build error
e3a850c3
adding unaligned code
531f20b7
limit to 4 bits, and separate out pack aligned and unaligned
5bc92eac
refactored QuantizeColumnWisePackAligned
2d17f5cd
finished quantize pack unaligned
6d8c90ce
updated TransposeColumnWiseQuantizedPackAligned
8d8244c6
finished TransposeColumnWiseQuantizedPackUnaligned
6fa950de
fixed one opNotLastAxis
b9904a16
fixed bloked Q 4bit multithread bug
31852ab0
fix build
080e41e1
fix build
181723d7
pass ut
1138dd61
finished benchmarking
9acdede3
added odd N to benchmark
87ba147a
fix ci build
d1dac08d
update bechmark to fix linux build
ab48c1eb
fix ci lint error
e0ba0695
resolve comments
c3f3b5e9
fix ci warning
44c0115e
fajin-corp
dismissed their stale review
via 44c0115e
1 year ago
fajin-corp
force pushed
from
a4fe4c50
to
44c0115e
1 year ago
yufenglee
approved these changes
on 2024-06-19
fajin-corp
deleted the fajin/qdqmatmulnbitskkernels branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub