Add CUTLASS + PT2-Triton kernels to gemm benchmark
Summary:
I did it by simply setting the max_autotune backend to only CUTLASS/TRITON as needed.
I also modified the baseline benchmark to explicitly disable autotuning, so that we can be more confident that it is invoking the ATen kernel.
Reviewed By: bertmaher, xuzhao9, chenyang78
Differential Revision: D56685216
fbshipit-source-id: 1638266254690b929f8c5591a194127c6a7c7be8