onnxruntime
Mlas int4 int8 with avx2/512
#20687
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
48
Changes
View On
GitHub
Mlas int4 int8 with avx2/512
#20687
liqunfu
merged 48 commits into
main
from
liqun/mlas-q4-tile-avx
quick adapt llama.cpp to experiment performance. Only works with blkl…
293f121d
fire
04c2e560
tile 2x4 SQNBITGEMM<4>/BlkLen:32/M:2048/N:4096/K:4096/Threads:1/Symme…
cdfda6fe
use one_16_epi16 and accumulate_2blk_dot: SQNBITGEMM<4>/BlkLen:32/M:2…
92dad979
apply to M1, BQuant layout pack block (subblk) larger than blklen: SQ…
5418e9c0
use new AQuant layout (not work if total M is not RangeCountM): SQNBI…
0401f726
apply blksum to blklen32 and 64: SQNBITGEMM<4>/BlkLen:32/M:2048/N:409…
a57eeba0
blklen16
f2c33af7
liqunfu
requested a review
1 year ago
liqunfu
requested a review
from
edgchen1
1 year ago
liqunfu
requested a review
from
yufenglee
1 year ago
liqunfu
requested a review
from
chenfucn
1 year ago
liqunfu
marked this pull request as draft
1 year ago
edgchen1
commented on 2024-05-20
impl avx512: SQNBITGEMM<4>/BlkLen:32/M:2048/N:4096/K:4096/Threads:1/S…
0ca24f48
liqunfu
changed the title
Mlas int4 int8 with avx2
Mlas int4 int8 with avx2/512
1 year ago
matmul_nbit & fix alignment for sgemm
7f89d5f9
merge main
ed0e6661
fix mlas benchmark not using multi threads
35d02a6b
profiling
b9493adb
Merge branch 'liqun/mlas-q4-tile-avx' of https://github.com/microsoft…
c443eb5b
sgemm after sq4bit for avx2
ac66951c
avx512
42a13056
layout to follow compute, M1 separate with M > 1
740031ac
github-advanced-security
commented on 2024-06-28
make avx512 run
1a6031e6
Merge branch 'main' into liqun/mlas-q4-tile-avx
283fd2dd
avx512 blklen64 pass
d0359391
github-advanced-security
commented on 2024-07-04
pass avx512 blklen32
f329d2dd
pass avx512 blklen 16, 128, 256
27cfd9c7
pass fp32, refactor sqnbitgemm
edee3198
merge main
fb9221a7
avx512vnni
c109b4b2
merge main
6654d22c
avxvnni
4b91bedb
rm unused ComputeParallelTasksSGemm
8674b9f1
avoid _mm256_dpbusds_avx_epi32 in avx512vnni
e26e29e8
fix linux build
2b0307e0
Merge branch 'main' into liqun/mlas-q4-tile-avx
40df7827
refactor for Arm64
51e97c8f
more refactor for Arm64
48e8639d
hsum_float_16
705aa1f2
hsum_float_16
012e9c46
condition for -mavxvnni
21b9138f
yufenglee
commented on 2024-07-30
yufenglee
commented on 2024-07-30
CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 10
1fb1c83e
yufenglee
commented on 2024-07-30
missed 2 files from (__GNUC__ > 10)
85918e98
yufenglee
commented on 2024-07-30
missed _mm256_dpbusds_avx_epi32 and print out cmake msgs
9530ac56
unused zp, etc.
f77cffd4
unused zp, etc.
a6fd3788
remove test code changes
c875e5c9
remove test code changes
3b56710e
lint
746562f6
liqunfu
marked this pull request as ready for review
1 year ago
lint
52fc7fa8
code name
0933a6b8
edgchen1
commented on 2024-07-30
yufenglee
commented on 2024-07-31
yufenglee
commented on 2024-07-31
yufenglee
commented on 2024-07-31
yufenglee
commented on 2024-07-31
yufenglee
commented on 2024-07-31
yufenglee
commented on 2024-07-31
yufenglee
commented on 2024-07-31
yufenglee
commented on 2024-07-31
yufenglee
commented on 2024-07-31
update reviewers' comments
2b35c820
edgchen1
commented on 2024-07-31
Merge branch 'main' into liqun/mlas-q4-tile-avx
caeb35eb
yufenglee
approved these changes on 2024-08-01
liqunfu
merged
b87e8edb
into main
1 year ago
liqunfu
deleted the liqun/mlas-q4-tile-avx branch
1 year ago
prathikr
added
release:1.19.0
prathikr
added
cherry-picked
snnn
commented on 2024-10-16
Login to write a write a comment.
Login via GitHub
Reviewers
yufenglee
edgchen1
snnn
github-advanced-security
chenfucn
Assignees
No one assigned
Labels
cherry-picked
release:1.19.0
Milestone
No milestone
Login to write a write a comment.
Login via GitHub