Swap detection order in randperm_out_cuda to avoid unnecessary conversion from float when the input is small.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22103
Test Plan: Imported from OSS
Differential Revision: D16153585
Pulled By: li-roy
fbshipit-source-id: 0801b91e7b352c8de8fdfbe929be85d69182b8da