add diag into pt operator microbenchmark (#32597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32597
Currently, there is no benchmark test about diag operator. This diff will add one into the suite.
Test Plan:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: diag
# Mode: Eager
# Name: diag_dim1_M64_N64_diagonal0_outTrue_cpu
# Input: dim: 1, M: 64, N: 64, diagonal: 0, out: True, device: cpu
Forward Execution Time (us) : 28.496
# Benchmarking PyTorch: diag
# Mode: Eager
# Name: diag_dim2_M128_N128_diagonal-10_outFalse_cpu
# Input: dim: 2, M: 128, N: 128, diagonal: -10, out: False, device: cpu
Forward Execution Time (us) : 45.179
# Benchmarking PyTorch: diag
# Mode: Eager
# Name: diag_dim1_M256_N256_diagonal20_outTrue_cpu
# Input: dim: 1, M: 256, N: 256, diagonal: 20, out: True, device: cpu
Forward Execution Time (us) : 49.009
```
Reviewed By: mingzhe09088
Differential Revision: D19564024
fbshipit-source-id: 828a3e0e0e06810a77eb5ddb734efd30e4a63acf