onnxruntime
7801c513 - [x86] matmulnbit x64 kernel for 8bits (#24491)

Commit
316 days ago
[x86] matmulnbit x64 kernel for 8bits (#24491) ### Description Add 8bits support for matmulnbits on x86 __AVX512 VNNI__ | M | N | K | 8-bit Time (ns) | 4-bit Time (ns) | Slow down (8-bit / 4-bit) | |:-----:|:-------:|:-------:|:----------------:|:----------------:|:------------------------:| | 1 | 4096 | 4096 | 34145 | 27723 | **1.23×** | | 1 | 11008 | 4096 | 415285 | 68656 | **6.05×** | | 1 | 4096 | 11008 | 407801 | 68061 | **5.99×** | | 1 | 11008 | 11008 | 2674538 | 1003532 | **2.67×** | | 4096 | 4096 | 4096 | 80338759 | 86321713 | **0.93×** | | 4096 | 11008 | 4096 | 213421935 | 225245276 | **0.95×** | | 4096 | 4096 | 11008 | 240164365 | 228966953 | **1.05×** | | 4096 | 11008 | 11008 | 628352046 | 596738340 | **1.05×** | __AVX512__ | M | N | K | 8-bit Time (ns) | 4-bit Time (ns) | Slow down (8-bit / 4-bit) | |:-----:|:-------:|:-------:|:----------------:|:----------------:|:------------------------:| | 1 | 4096 | 4096 | 53324 | 37882 | **1.41×** | | 1 | 11008 | 4096 | 244560 | 103255 | **2.37×** | | 1 | 4096 | 11008 | 435131 | 95734 | **4.55×** | | 1 | 11008 | 11008 | 2790710 | 1075216 | **2.60×** | | 4096 | 4096 | 4096 | 200629000 | 132841540 | **1.51×** | | 4096 | 11008 | 4096 | 532141914 | 350613184 | **1.52×** | | 4096 | 4096 | 11008 | 544011977 | 351679619 | **1.55×** | | 4096 | 11008 | 11008 | 1421865147 | 925593210 | **1.54×** | Token generation is bottlenecked at memory access. 8b model's 2x size is major reason of token generation slow down. For non-vnni platform, the i16 cannot fit in 4 i8. To avoid overflow extra instructions are needed. This is the major reason of non-vnni slow down. ### Motivation and Context MatMul4Bits model has repetition issue. 6b model resolved this issue.
Author
Parents
Loading