onnxruntime
[MLAS] AArch64 SQNBitGemm CompInt8 initial multi-row implementation
#21193
Merged

Commits
  • initial impl for m=2 kernel that computes 2x2 outputs at a time, for blklen > 32
    edgchen1 committed 1 year ago
  • support blklen 32, zero point
    edgchen1 committed 1 year ago
  • implement tiling for blklen 32
    edgchen1 committed 1 year ago
  • move to tiling approach for sqnbitgemm compint8 impl
    edgchen1 committed 1 year ago
  • fix returned registered test count
    edgchen1 committed 1 year ago
  • use variable for HasZeroPoint template parameter value
    edgchen1 committed 1 year ago
  • split out sqnbitgemm ARM NEON impl into multiple files
    edgchen1 committed 1 year ago
  • Merge remote-tracking branch 'origin/main' into edgchen1/sqnbitgemm_multi_row
    edgchen1 committed 1 year ago
  • update sqnbitgemm avx code to use new SQ4BitGemmKernel_CompInt8 interface
    edgchen1 committed 1 year ago
  • put impl into unnamed namespace, comment
    edgchen1 committed 1 year ago
  • fix zp loading
    edgchen1 committed 1 year ago
  • fix post processor call arguments
    edgchen1 committed 1 year ago
  • helper functions for advancing row/col ptrs
    edgchen1 committed 1 year ago
  • fix indentation
    edgchen1 committed 1 year ago
Loading