pytorch
30befa59 - add int4 packed gemm support on CPU device (#117475)

Commit

253 days ago

add int4 packed gemm support on CPU device (#117475) This patch adds int4 packed gemm support on CPU, both `avx512` and `avx2` are supported. It is used to speedup https://github.com/pytorch-labs/gpt-fast The default perf measured on Intel (R) Xeon (R) CPU Max 9480, single socket (56 cores) is `16.13 sec total, 12.40 tokens/sec` * WOQ int4 on avx512: `5.92 sec total, 33.79 tokens/sec` * WOQ int4 on avx2: `6.90 sec total, 29.00 tokens/sec` WOQ int4 is measured with method: https://github.com/pytorch-labs/gpt-fast?tab=readme-ov-file#int4-weight-only-quantization Pull Request resolved: https://github.com/pytorch/pytorch/pull/117475 Approved by: https://github.com/jgong5, https://github.com/malfet

Author

mingfeima

Committer

pytorchmergebot

Parents

c8e56b49

pytorch 30befa59 - add int4 packed gemm support on CPU device (#117475)

pytorch
30befa59 - add int4 packed gemm support on CPU device (#117475)