pytorch
92e36240 - fix nonzero perf regression (#58468)

Commit
3 years ago
fix nonzero perf regression (#58468) Summary: https://github.com/pytorch/pytorch/issues/55292 introduced perf regression for nonzero cuda, this fixes it. nvcc is still pretty bad about unrolling loops with boundaries that are not known at compile time, this makes `write_indices` kernels ~5x slower than it should be. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58468 Reviewed By: mruberry Differential Revision: D28511147 Pulled By: ngimel fbshipit-source-id: fe7303ec77da1abbe5e874093eca247b3919616f
Author
Natalia Gimelshein
Parents
Loading