Add padded, torch.nanmean benchmark for jagged_mean operator (#2353)
Summary:
Pull Request resolved: https://github.com/pytorch/benchmark/pull/2353
Add to TritonBench a `jagged_mean` reduction operator benchmark for nested tensors using the PyTorch `torch.nanmean` function and [`torch.ops.aten._jagged_to_padded_dense_forward`](https://www.internalfb.com/code/fbsource/[92c2a067ab04e3eebc999254fed4ae2fbea6def3]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fb/inductor_lowerings/elementwise_ops.py?lines=26). This diff implements a basic benchmark for reducing along the ragged dimension of 3-dimensional jagged tensors. For a 3-dimensional tensor of shape `(B, *, M)`, where `*` is the ragged dimension, this benchmark pads each 2-dimensional tensor with `NaN`, then uses `torch.nanmean` to take the `mean` of the dense tensor for each of `B` `(*, M)`, ignoring `NaN` values.
Extend plotting functionality for the `jagged_mean` operator to account for the new benchmark. Add an `accuracy` metric to verify that the results of all existing benchmarks match.
This diff follows the general framework found in the `jagged_sum` operator (D58396957, D59034792).
Reviewed By: davidberard98
Differential Revision: D59146024
fbshipit-source-id: 37bc2bc50a56167097a985466df5432c98701cba