onnxruntime
MLAS AArch64 quantized int4 Gemm kernel
#18031
Merged

MLAS AArch64 quantized int4 Gemm kernel #18031

edgchen1 merged 54 commits into main from edgchen1/arm64_q4gemm
edgchen1
edgchen1 change AddBiasAvx to MlasAddBiasForGemm
340ba219
edgchen1 use std::cerr instead of logging in cpuid_uarch.cc
05c04ef2
edgchen1 add infrastructure for q4gemm neon impl
6fe9b81a
edgchen1 make test_main.cc threadpool a unique_ptr to avoid memory leak output…
f7bd0299
edgchen1 initial neon impl of q4gemm
85e76cea
edgchen1 reference implementation of MlasBlkQ4DequantBNeon
922434f2
edgchen1 Merge remote-tracking branch 'origin/main' into edgchen1/arm64_q4gemm
9dfa8c2d
edgchen1 WIP optimize MlasQ4GemmKernelNeon
6048f8c9
edgchen1 fix some bugs, enable implementation for other Q4Types, remove old im…
a88eedc7
edgchen1 Merge remote-tracking branch 'origin/main' into edgchen1/arm64_q4gemm
d83023bc
edgchen1 uncomment an impl
34d40a8f
edgchen1 edgchen1 requested a review from chenfucn chenfucn 2 years ago
edgchen1 edgchen1 requested a review from yufenglee yufenglee 2 years ago
edgchen1
edgchen1 commented on 2023-10-19
edgchen1 remove redundant inline
0a4c4343
edgchen1 Merge remote-tracking branch 'origin/edgchen1/arm64_q4gemm' into edgc…
6de63398
edgchen1 save work - got sqnbitgemm tests and a cpu impl
01ac345d
edgchen1 clean up and add doc comments
d7bd7098
github-advanced-security
github-advanced-security commented on 2023-10-26
edgchen1 Merge remote-tracking branch 'origin/main' into edgchen1/arm64_q4gemm
e39dafb2
yihonglyu
yihonglyu commented on 2023-10-26
edgchen1
edgchen1 commented on 2023-10-26
edgchen1
edgchen1 commented on 2023-10-26
edgchen1
edgchen1 commented on 2023-10-26
edgchen1 remove pragma once from cpp file
15f2b268
edgchen1 initial neon impl of MlasSQNBitGemmKernelNeon
a5e51ed3
edgchen1 add benchmark
d2682d0b
edgchen1 rename benchmark fn
2334da9e
edgchen1 Merge remote-tracking branch 'origin/main' into edgchen1/arm64_q4gemm…
f36e60e3
edgchen1 Fix build issue in reduceminmax.cc.
1761dffe
edgchen1 add block size 128 impl
fe754a60
edgchen1 add block size 128 tests
79934698
edgchen1 fix buffer type
b9c20035
edgchen1 save work
e90de87d
edgchen1 Merge remote-tracking branch 'origin/main' into edgchen1/arm64_q4gemm…
bdfc0529
edgchen1 add MLAS buffer size fn, remove old reference packing impl
448c4e52
edgchen1 fix some stuff
90f2ab5e
edgchen1
edgchen1 commented on 2023-11-04
edgchen1 Merge remote-tracking branch 'origin/main' into edgchen1/arm64_q4gemm
6ed16f61
edgchen1 revert MlasTesterType change
93108932
edgchen1 remove q4gemm_neon impl
3d60b38b
edgchen1 add file comment
89708c5a
edgchen1 edgchen1 marked this pull request as ready for review 2 years ago
edgchen1 edgchen1 requested a review 2 years ago
edgchen1 edgchen1 changed the title [WIP] MLAS AArch64 Q4Gemm kernels MLAS AArch64 quantized int4 Gemm kernel 2 years ago
edgchen1 compile warnings
07a4a574
edgchen1 fix another compiler warning
e061fa7f
edgchen1 update op attr helpers - add GetAttr returning T and clean up nodisca…
815292c9
edgchen1 change data members to size_t
fc54b7bb
edgchen1
edgchen1 commented on 2023-11-07
edgchen1 remove unused MlasQ4BlkHasZeroPoint
608e0726
yufenglee
yufenglee commented on 2023-11-08
yufenglee
yufenglee commented on 2023-11-08
yufenglee
yufenglee commented on 2023-11-08
yufenglee
yufenglee commented on 2023-11-08
yufenglee
yufenglee commented on 2023-11-08
yufenglee
yufenglee commented on 2023-11-08
edgchen1 improve comment
fb756907
yufenglee
yufenglee commented on 2023-11-08
yufenglee
yufenglee commented on 2023-11-08
yufenglee
yufenglee commented on 2023-11-08
yufenglee
yufenglee commented on 2023-11-08
yufenglee
yufenglee commented on 2023-11-09
edgchen1 manual float conversion
c59e435c
edgchen1 revert q4gemm.h and q4gemm_avx512.cpp changes
c6241168
edgchen1 add block length 256 impl
c44c380b
edgchen1
edgchen1 commented on 2023-11-09
edgchen1 address some PR comments
52a721a0
justinchuby
justinchuby commented on 2023-11-10
edgchen1 Merge remote-tracking branch 'origin/main' into edgchen1/arm64_q4gemm
9dee489a
edgchen1 update comment, fix type issue
0fce591a
edgchen1 add k=11008 n=11008 benchmarks
bac7bbfa
edgchen1 remove impl0_reference for MlasSQNBitGemmKernelNeon
f3c01627
edgchen1 fix unused variable
f9a31ed7
edgchen1 edgchen1 requested a review from yufenglee yufenglee 2 years ago
edgchen1 edgchen1 requested a review from yihonglyu yihonglyu 2 years ago
edgchen1 update headers, fix type
04cb1614
yufenglee
yufenglee commented on 2023-11-13
yufenglee
yufenglee commented on 2023-11-14
yufenglee
yufenglee commented on 2023-11-14
yufenglee
yufenglee commented on 2023-11-14
yufenglee
yufenglee commented on 2023-11-14
yufenglee
yufenglee commented on 2023-11-14
yufenglee
yufenglee commented on 2023-11-14
edgchen1 Merge remote-tracking branch 'origin/main' into edgchen1/arm64_q4gemm
56069ec5
edgchen1 address some comments
5212337c
edgchen1 remove b_data ptrs
97a8f58a
chenfucn
chenfucn commented on 2023-11-14
edgchen1 pass array by reference instead of returning std::array
2a6a9fcf
edgchen1 address comments
8458b754
yufenglee
yufenglee approved these changes on 2023-11-15
edgchen1 edgchen1 merged 0a4d76d9 into main 2 years ago
edgchen1 edgchen1 deleted the edgchen1/arm64_q4gemm branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone