improve dim sort performance (#19379)
Summary:
We are already using custom comparators for sorting (for a good reason), but are still making 2 sorting passes - global sort and stable sorting to bring values into their slices. Using a custom comparator to sort within a slice allows us to avoid second sorting pass and brings up to 50% perf improvement.
t-vi I know you are moving sort to ATen, and changing THC is discouraged, but #18350 seems dormant. I'm fine with #18350 landing first, and then I can put in these changes.
cc umanwizard for review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19379
Differential Revision: D15011019
Pulled By: soumith
fbshipit-source-id: 48e5f5aef51789b166bb72c75b393707a9aed57c
Author
Natalia Gimelshein