[MLAS AArch64] SQNBitGemm CompInt8 kernel #18953
only register q4gemm benchmarks if q4gemm is available
8940c0a5
some mlas cmake updates
a6a8ce62
change BlkLen from template param to function param
53a46ca8
Save work
e2a9eee8
only enable benchmark if available
966a9150
handle workspace in benchmark
b59e7e13
QuantizeARow neon impl1
585103be
dot compint8 neon impl
c26cef4f
use single workspace pointer in interface, get matmul_nbits working
1b7d81b4
Merge remote-tracking branch 'origin/main' into edgchen1/sqnbitgemm_q…
f7e3db50
renaming and cleanup
71bd3a92
try different comp types in matmulnbits
f7127f9f
Merge remote-tracking branch 'origin/main' into edgchen1/sqnbitgemm_q…
0060f554
rename enum, add doc
b3147c6c
change quant b params from uint8_t* to std::byte*
789bcdcd
handle CompUndef
039dd92b
edgchen1
changed the title [MLAS AArch64] SQNBitGemm CompInt8 kernel [WIP][MLAS AArch64] SQNBitGemm CompInt8 kernel 1 year ago
check if dot product instructions are available before setting SQNBit…
cb9f4287
try to fix compile issue
437ad52a
move zero initialize out of unrolled loop
241ca27d
update comment
53e2ae29
split out float conversion
d5b26b4d
remove impl0_reference
02cf7b37
use thread per gemm in prepare workspace fn, reorder include
5b4a86c7
edgchen1
changed the title [WIP][MLAS AArch64] SQNBitGemm CompInt8 kernel [MLAS AArch64] SQNBitGemm CompInt8 kernel 1 year ago
edgchen1
marked this pull request as ready for review 1 year ago
make pointer const
61998ea6
Merge remote-tracking branch 'origin/main' into edgchen1/sqnbitgemm_q…
fe7f0e70
remove unneeded and
d54cbd96
Merge remote-tracking branch 'origin/main' into edgchen1/sqnbitgemm_q…
7d8753cb
move code from merge conflict
6d88a0b4
pack quant b data
ccaa9947
get matmulnbits working, add docs
cff3cb47
Merge remote-tracking branch 'origin/main' into edgchen1/sqnbitgemm_q…
f8aba0cd
use threadpool to pack b data
33e6dd90
shorten names, update docs
4cd2474c
rename another function, add check for implementation in MlasSQNBitGe…
9244a3f1
move b_data_block_offset out of unrolled loop body
86f84ea0
yufenglee
dismissed these changes
on 2024-01-12
move b data offset out of unrolled loop in compfp32 kernel
23373759
edgchen1
dismissed their stale review
via 23373759
1 year ago
yufenglee
approved these changes
on 2024-01-12
edgchen1
merged
150c4cb8
into main 1 year ago
edgchen1
deleted the edgchen1/sqnbitgemm_quantize_a branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub