benchmark
3ecaae9d - Add support for reducing across individual dimensions for 2D matrices using the sum Triton kernel (#2295)

Commit

2 years ago

Add support for reducing across individual dimensions for 2D matrices using the sum Triton kernel (#2295) Summary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/2295 Support reducing a 2-dimensional matrix across one dimension, where the `BLOCK_SIZE` in the reduced dimension is larger than the dimension size. This kernel performs a simplified reduction which assumes that the entire reduction dimension of the tensor fits in a thread block. The implementation handles toggling between block sizes for the `M` and `N` dimensions depending on the reduction dimension. For example, this kernel will reduce across the 0-th dimension for a (M, N) = (16, 16) matrix where `BLOCK_SIZE_M >= 16` and `BLOCK_SIZE_N` is autotuned. Add a `best_config` metric to find the best `BLOCK_SIZE` for the non-reduction dimension and `num_warps` given some input size. Reviewed By: jbschlosser Differential Revision: D58261858 fbshipit-source-id: 8995c91c54e9792b52f4608446e8e940027a604d

Author

jananisriram

Committer

facebook-github-bot

Parents

c13df576

benchmark 3ecaae9d - Add support for reducing across individual dimensions for 2D matrices using the sum Triton kernel (#2295)

benchmark
3ecaae9d - Add support for reducing across individual dimensions for 2D matrices using the sum Triton kernel (#2295)