Add persistent+TMA matmul to fp8 gemm benchmark (#2377)
Summary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/2377
Reviewed By: xuzhao9, sijiac
Differential Revision: D59812172
Pulled By: bertmaher
fbshipit-source-id: 450229888e09c22b9cd11a37015e6d601ec919ce