pytorch
1fe2a9d1 - Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)

Commit

1 year ago

Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339) Add _int_mm primitive that binds cuBLAS int8@int8 -> int32 matmul and that translates to Triton based mm templates under max autotune. This is a very useful first step towards better supporting quantization on the GPU. This is a not a user facing API, but an internal primitive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94339 Approved by: https://github.com/ngimel, https://github.com/jansel

Author

cpuhrsch

Committer

pytorchmergebot

Parents

32558910

pytorch 1fe2a9d1 - Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)

pytorch
1fe2a9d1 - Add _int_mm to expose cuBLAS int8@int8 -> int32 matmul (#94339)