SemanticDiff

pytorch
81683380 - Add CPU implementation for `torch._int_mm` (s8*s8->s32) (#121792)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

184 days ago

Add CPU implementation for `torch._int_mm` (s8*s8->s32) (#121792) Fixes #121647 **Description** Currently, the op `torch._int_mm` only supports CUDA device. This PR adds CPU implementation for it. Besides the request from the issue, this op may also be useful for planned CPU implementations of [LLM.int8()](https://arxiv.org/abs/2208.07339) in [Bitsandbytes](https://github.com/TimDettmers/bitsandbytes). The implementation prefers mkldnn (oneDNN) kernels. If mkldnn is not available, a reference implementation with nested for loops is used. **Test plan** `python test/test_linalg.py -k test__int_mm_cpu` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121792 Approved by: https://github.com/jgong5, https://github.com/lezcano

Author

Xia-Weiwen

Xia-Weiwen

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading