Turn on TMA by default for row-wise GEMM (#2450)
Summary:
Pull Request resolved: https://github.com/pytorch/benchmark/pull/2450
X-link: https://github.com/facebookresearch/FBGEMM/pull/189
Enabling the TMA row-wise GEMM by default it TMA appears to give quite some speedup across-the-board, up to 40% for some shapes.
Reviewed By: choutim
Differential Revision: D62212842
fbshipit-source-id: 59220cec90e222fe91be9f53a3477f1c38e02e2a