benchmark
f4cbf782 - Extend support to varying block sizes on both dimensions for 2D matrices (#2302)

Commit
1 year ago
Extend support to varying block sizes on both dimensions for 2D matrices (#2302) Summary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/2302 Extend support for reducing across individual dimensions on 2-dimensional matrices by allowing for varying block sizes on both the `M` (first) and `N` (second) dimensions. The existing kernel performed a simplified reduction, assuming that the entire reduction dimension fit within one thread block. The new kernel implementation removes the need for this assumption, allowing both the reduction and the non-reduction dimensions to fit in multiple thread blocks. This implementation also enables autotuning on block sizes for both the `M` and `N` dimensions. For 1D results, add a `sum_then_buffer` configuration which decides which kernel configuration to run. `Sum_then_buffer` sums individual blocks of input and adds these sums into a buffer. `Buffer_then_sum` adds blocks of raw input into a buffer, then reduces the buffer. Reviewed By: davidberard98 Differential Revision: D58313958 fbshipit-source-id: 639ea6b7d7b92f478c0f5669a1cdc0dcb68004e3
Author
Parents
Loading