onnxruntime
Mlas int4 int8 with avx2/512
#20687
Merged

Mlas int4 int8 with avx2/512 #20687

liqunfu merged 48 commits into main from liqun/mlas-q4-tile-avx
liqunfu
liqunfu quick adapt llama.cpp to experiment performance. Only works with blkl…
293f121d
liqunfu fire
04c2e560
liqunfu tile 2x4 SQNBITGEMM<4>/BlkLen:32/M:2048/N:4096/K:4096/Threads:1/Symme…
cdfda6fe
liqunfu use one_16_epi16 and accumulate_2blk_dot: SQNBITGEMM<4>/BlkLen:32/M:2…
92dad979
liqunfu apply to M1, BQuant layout pack block (subblk) larger than blklen: SQ…
5418e9c0
liqunfu use new AQuant layout (not work if total M is not RangeCountM): SQNBI…
0401f726
liqunfu apply blksum to blklen32 and 64: SQNBITGEMM<4>/BlkLen:32/M:2048/N:409…
a57eeba0
liqunfu blklen16
f2c33af7
liqunfu liqunfu requested a review 1 year ago
liqunfu liqunfu requested a review from edgchen1 edgchen1 1 year ago
liqunfu liqunfu requested a review from yufenglee yufenglee 1 year ago
liqunfu liqunfu requested a review from chenfucn chenfucn 1 year ago
liqunfu liqunfu marked this pull request as draft 1 year ago
edgchen1
edgchen1 commented on 2024-05-20
liqunfu impl avx512: SQNBITGEMM<4>/BlkLen:32/M:2048/N:4096/K:4096/Threads:1/S…
0ca24f48
liqunfu liqunfu changed the title Mlas int4 int8 with avx2 Mlas int4 int8 with avx2/512 1 year ago
liqunfu matmul_nbit & fix alignment for sgemm
7f89d5f9
liqunfu merge main
ed0e6661
liqunfu fix mlas benchmark not using multi threads
35d02a6b
liqunfu profiling
b9493adb
liqunfu Merge branch 'liqun/mlas-q4-tile-avx' of https://github.com/microsoft…
c443eb5b
liqunfu sgemm after sq4bit for avx2
ac66951c
liqunfu avx512
42a13056
liqunfu layout to follow compute, M1 separate with M > 1
740031ac
github-advanced-security
github-advanced-security commented on 2024-06-28
liqunfu make avx512 run
1a6031e6
liqunfu Merge branch 'main' into liqun/mlas-q4-tile-avx
283fd2dd
liqunfu avx512 blklen64 pass
d0359391
github-advanced-security
github-advanced-security commented on 2024-07-04
liqunfu pass avx512 blklen32
f329d2dd
liqunfu pass avx512 blklen 16, 128, 256
27cfd9c7
liqunfu pass fp32, refactor sqnbitgemm
edee3198
liqunfu merge main
fb9221a7
liqunfu avx512vnni
c109b4b2
liqunfu merge main
6654d22c
liqunfu avxvnni
4b91bedb
liqunfu rm unused ComputeParallelTasksSGemm
8674b9f1
liqunfu avoid _mm256_dpbusds_avx_epi32 in avx512vnni
e26e29e8
liqunfu fix linux build
2b0307e0
liqunfu Merge branch 'main' into liqun/mlas-q4-tile-avx
40df7827
liqunfu refactor for Arm64
51e97c8f
liqunfu more refactor for Arm64
48e8639d
liqunfu hsum_float_16
705aa1f2
liqunfu hsum_float_16
012e9c46
liqunfu condition for -mavxvnni
21b9138f
yufenglee
yufenglee commented on 2024-07-30
yufenglee
yufenglee commented on 2024-07-30
liqunfu CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 10
1fb1c83e
yufenglee
yufenglee commented on 2024-07-30
liqunfu missed 2 files from (__GNUC__ > 10)
85918e98
yufenglee
yufenglee commented on 2024-07-30
liqunfu missed _mm256_dpbusds_avx_epi32 and print out cmake msgs
9530ac56
liqunfu unused zp, etc.
f77cffd4
liqunfu unused zp, etc.
a6fd3788
liqunfu remove test code changes
c875e5c9
liqunfu remove test code changes
3b56710e
liqunfu lint
746562f6
liqunfu liqunfu marked this pull request as ready for review 1 year ago
liqunfu lint
52fc7fa8
liqunfu code name
0933a6b8
edgchen1
edgchen1 commented on 2024-07-30
yufenglee
yufenglee commented on 2024-07-31
yufenglee
yufenglee commented on 2024-07-31
yufenglee
yufenglee commented on 2024-07-31
yufenglee
yufenglee commented on 2024-07-31
yufenglee
yufenglee commented on 2024-07-31
yufenglee
yufenglee commented on 2024-07-31
yufenglee
yufenglee commented on 2024-07-31
yufenglee
yufenglee commented on 2024-07-31
yufenglee
yufenglee commented on 2024-07-31
liqunfu update reviewers' comments
2b35c820
edgchen1
edgchen1 commented on 2024-07-31
liqunfu Merge branch 'main' into liqun/mlas-q4-tile-avx
caeb35eb
yufenglee
yufenglee approved these changes on 2024-08-01
liqunfu liqunfu merged b87e8edb into main 1 year ago
liqunfu liqunfu deleted the liqun/mlas-q4-tile-avx branch 1 year ago
prathikr prathikr added release:1.19.0
prathikr prathikr added cherry-picked
snnn
snnn commented on 2024-10-16
Rohanjames1997

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone