[MLAS] AArch64 SQNBitGemm CompInt8 initial multi-row implementation #21193
initial impl for m=2 kernel that computes 2x2 outputs at a time, for …
0b3db2bd
support blklen 32, zero point
e338ef04
implement tiling for blklen 32
10505a6f
move to tiling approach for sqnbitgemm compint8 impl
b7bb4d22
fix returned registered test count
fdfd25a8
use variable for HasZeroPoint template parameter value
19f13ffe
split out sqnbitgemm ARM NEON impl into multiple files
abc70102
Merge remote-tracking branch 'origin/main' into edgchen1/sqnbitgemm_m…
36c47c72
update sqnbitgemm avx code to use new SQ4BitGemmKernel_CompInt8 inter…
f6168081
put impl into unnamed namespace, comment
f5b1817a
edgchen1
changed the title [MLAS] AArch64 SQNBitGemm CompInt8 initial multi-row implementation [WIP][MLAS] AArch64 SQNBitGemm CompInt8 initial multi-row implementation 1 year ago
fix zp loading
e35f2b34
edgchen1
marked this pull request as ready for review 1 year ago
edgchen1
changed the title [WIP][MLAS] AArch64 SQNBitGemm CompInt8 initial multi-row implementation [MLAS] AArch64 SQNBitGemm CompInt8 initial multi-row implementation 1 year ago
fix post processor call arguments
fbc6c8a7
helper functions for advancing row/col ptrs
3d8fe4d1
liqunfu
dismissed these changes
on 2024-07-03
fix indentation
828e8de5
edgchen1
dismissed their stale review
via 828e8de5
1 year ago
yufenglee
approved these changes
on 2024-07-10
edgchen1
merged
20cd3394
into main 1 year ago
edgchen1
deleted the edgchen1/sqnbitgemm_multi_row branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub