Add jagged_sum operator for unpadded nested tensors to TritonBench (#2299)
Summary:
Pull Request resolved: https://github.com/pytorch/benchmark/pull/2299
Add a `jagged_sum` reduction operator for unpadded nested tensors, based on the PyTorch `sum` operator, to TritonBench. This diff implements a basic benchmark for reducing along the ragged dimension for 3-dimensional nested tensors. For a 3-dimensional tensor of shape `(B, *, M)`, where `*` is the ragged dimension, this benchmark uses PyTorch's `sum` operator to reduce `B` `(*, M)` 2-dimensional tensors to a `(B, M)` output tensor.
Measure performance of basic benchmark with `gbps` and `latency` metrics and display nested tensor parameters `B` and `M`.
Reviewed By: YuqingJ
Differential Revision: D58396957
fbshipit-source-id: bb1d88184006f19dc61f8420e5c16a1fcce24fe0