Enhance Tensor indexSelect performance (#23055)
Summary:
This is try to reduce the overhead on the index_select on CPU path at DLRM (https://github.com/facebookresearch/dlrm). To make src as contiguous can make it go into the parallelied path in Tensor indexSelect function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23055
Differential Revision: D16603913
Pulled By: ezyang
fbshipit-source-id: baaa02f184a8e70f1193e5d96ada195a46d140b9