pytorch
1fe2a9d1 - Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)

Commit
1 year ago
Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339) Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339 Approved by: https://github.com/ngimel, https://github.com/jansel
Author
Committer
Parents
Loading