[MLAS][AArch64] SQ4BitGemm CompInt8 multi-block implementation #19826
add threads=4 benchmark tests
3ed0c314
blklen 32+ multi block impl
7812aa30
blklen 16 multi block impl
9276066c
make scale a vector
3c37356f
edgchen1
changed the title [MLAS][AArch64] SQNBitGemm CompInt8 multi-block implementation [MLAS][AArch64] SQ4BitGemm CompInt8 multi-block implementation 1 year ago
add cast to const uint8_t*
6038ca03
use correct vreinterpret call
474d11fe
add benchmark that gets args from environment variables
3b474b83
edgchen1
marked this pull request as ready for review 1 year ago
Merge remote-tracking branch 'origin/main' into edgchen1/sqnbitgemm_m…
ec786ec5
yufenglee
approved these changes
on 2024-03-14
edgchen1
merged
0b90363a
into main 1 year ago
edgchen1
deleted the edgchen1/sqnbitgemm_multiblock branch 1 year ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub