Vectorize reduction when reducing on fastest striding dimension [resubmit] (#36873)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36873
Differential Revision: D21109194
Pulled By: ngimel
fbshipit-source-id: eb18c6b4394f19a6c5eca45ef4ce97d623e051bd