Add torch.mean benchmark for jagged_mean operator (#2352)
Summary:
Pull Request resolved: https://github.com/pytorch/benchmark/pull/2352
Add to TritonBench a `jagged_mean` reduction operator for nested tensors using the PyTorch `torch.mean` and `unbind` functions. This diff implements a basic benchmark for reducing along the ragged dimension of 3-dimensional jagged tensors. For a 3-dimensional tensor of shape `(B, *, M)`, where `*` is the ragged dimension, this benchmark uses PyTorch's `mean` operator to reduce `B` `(*, M)` 2-dimensional tensors to a `(B, M)` output tensor.
Add plotting functionality to the `jagged_mean` operator in TritonBench, enabling the creation of line plots for any set of benchmarks variable along one of the following input parameters: `B`, `M`, `seqlen`, or `sparsity`. This diff sets the groundwork to visualize the differences in `latency` among the different benchmarks in the `jagged_mean` operator.
Measure performance of basic PyTorch benchmark using the `latency` and `gbps` metrics as well as the `latency` plot, variable along one input parameter. Display nested tensor parameters in benchmark output.
This diff follows the general framework found in the `jagged_sum` operator (D58396957, D59034792).
Reviewed By: davidberard98
Differential Revision: D59144906
fbshipit-source-id: 8eb29a3d543e716991c575f5553729ab409f3810