benchmark
0ab0e47e - bf16xint16_gemm operator: add --transpose option (#2466)

Commit

1 year ago

bf16xint16_gemm operator: add --transpose option (#2466) Summary: `--transpose` will make this benchmark test a int16 x bf16 mm instead of a bf16 x int16. This matters for H100, because the wgmma instruction can take registers only on the LHS. So int16 x bf16 is probably the easier one to support efficiently. Pull Request resolved: https://github.com/pytorch/benchmark/pull/2466 Test Plan: In OSS: ran `python run_benchmark.py triton --op bf16xint16_gemm --transpose` Internally, ran `buck2 run mode/opt //pytorch/benchmark:triton -- --op bf16xint16_gemm --transpose` Internally, we run into the issue fixed by https://github.com/triton-lang/triton/pull/4695; but otherwise, they both run. Reviewed By: aakhundov Differential Revision: D63294109 Pulled By: davidberard98 fbshipit-source-id: 3ea05bb09e62f51c405ae538726caf80e1ba0d63

Author

davidberard98

Committer

facebook-github-bot

Parents

6a089a45

benchmark 0ab0e47e - bf16xint16_gemm operator: add --transpose option (#2466)

benchmark
0ab0e47e - bf16xint16_gemm operator: add --transpose option (#2466)