Vectorize reduction when reducing on fastest striding dimension (#36709)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36709
Test Plan: Imported from OSS
Differential Revision: D21083393
Pulled By: ngimel
fbshipit-source-id: ea3f7f29709c9a6e5b3ec45ba809cb2cf6c5e0c8