pytorch
93d2e509 - Improve performance of index_select by avoiding item (#63008)

Commit
3 years ago
Improve performance of index_select by avoiding item (#63008) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/61788 From a CUDA perspective: item already pulls all Tensor content onto the host (albeit one-by-one), which incurs very expensive memory transfers. This way we'll do it all at once. From a CPU perspective: item has a lot of overhead as a native function in comparison to simply using a pointer. Overall there's still lots of performance gains to be had, but this is a small change that should take us into a more usable landscape. This doesn't land a separate benchmark, but I postulate that's not necessary to decide on the benefit of this (we'll also see if it shows up indirectly), however is still a good follow-up item. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63008 Reviewed By: zou3519 Differential Revision: D30211160 Pulled By: cpuhrsch fbshipit-source-id: 70b752be5df51afc66b5aa1c77135d1205520cdd
Author
=
Parents
Loading