Fix resource consumption in reductions (#89144)
Reductions along a (large enough) contiguous dimension vectorise the
loading of the inputs. This vectorisation was not taken into account
when computing the necessary resources for the kernel.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89144
Approved by: https://github.com/zasdfgbnm, https://github.com/ngimel