Relanding masked_select cuda port from TH to ATen (#36539)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33054
Relanding PR https://github.com/pytorch/pytorch/issues/35429
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36539
Differential Revision: D21007226
Pulled By: ngimel
fbshipit-source-id: 3c66ad073ff8e767ad120bc94120379d40346018