benchmark
ebd00aac - tritonbench bf16xint16 matmul template (#2348)

Commit
1 year ago
tritonbench bf16xint16 matmul template (#2348) Summary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/2348 Overall context: Before looking further into the bf16xint4 matmul, I'm planning to look into a bf16xint16 matmul first. The idea of this matmul is that it will just be the same as a bf16xbf16 matmul, except the second operand needs to be casted from int16 to bf16 in the triton kernel before executing. This PR: is NOT fully functional yet. It's just implemented this way to make review easier. There's 3 kernels that will be benchmarked here: 1. bf16xbf16 triton kernel - I've selected this kernel as the "baseline" because, ideally, we'd like the bf16xint16 kernel to be as close as possible to this kernel. 2. bf16xint16 triton kernel - this is NOT implemented yet, will be implemented in the follow-up PR. 3. bf16x(convert(int16 -> bf16)) triton kernel - i.e. convert the int16->bf16, write to global memory, and then run the bf16xbf16 kernel. Differential Revision: D59234085 imported-using-ghimport D59234085 Test Plan: Imported from OSS Reviewed By: xuzhao9 Pulled By: davidberard98 fbshipit-source-id: 75a493dbd78ee1aa1f63926f6dd61a2e7388816c
Author
Parents
Loading