Add cpu_serial_kernel_vec (#34553)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34553
This allows vectorized looping in a serial iteration over
TensorIterator.
Test Plan: Imported from OSS
Differential Revision: D20604238
Pulled By: ezyang
fbshipit-source-id: 61c451dac91d47cde7e1a937b271ab78c79e05d3