benchmark
2155130b - Add padded, torch.sum benchmark for jagged_mean operator (#2354)

Commit

1 year ago

Add padded, torch.sum benchmark for jagged_mean operator (#2354) Summary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/2354 Add to TritonBench a `jagged_mean` reduction operator benchmark for nested tensors using the PyTorch `torch.sum` function and [`torch.ops.aten._jagged_to_padded_dense_forward`](https://www.internalfb.com/code/fbsource/[92c2a067ab04e3eebc999254fed4ae2fbea6def3]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fb/inductor_lowerings/elementwise_ops.py?lines=26). This diff implements a benchmark for reducing along the ragged dimension of 3-dimensional jagged tensors. For a 3-dimensional tensor of shape `(B, *, M)`, where `*` is the ragged dimension, this benchmark pads each 2-dimensional tensor with zeros. Next, it divides the `sum` of the padded 3-dimensional tensor by the number of elements along each ragged dimension `*`, calculated using `x.offsets().diff()`. This benchmark avoids a GPU/CPU sync and is thus faster than the previous two PyTorch benchmarks, where D59144906 incurs a GPU/CPU sync and D59146024 uses the unoptimized `torch.nanmean` function. Reviewed By: davidberard98 Differential Revision: D59245842 fbshipit-source-id: f860b0d8bc98e27bb4dbea8dc44fac185ce5529f

Author

jananisriram

Committer

facebook-github-bot

Parents

5238ff13

benchmark 2155130b - Add padded, torch.sum benchmark for jagged_mean operator (#2354)

benchmark
2155130b - Add padded, torch.sum benchmark for jagged_mean operator (#2354)