pytorch
f993ceff - TensorIteratorReduce: Avoid tensor operations in parallel_for (#58655)

Commit
3 years ago
TensorIteratorReduce: Avoid tensor operations in parallel_for (#58655) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58655 Ref gh-56794 The two pass reduction calls `copy_` and `select` inside a parallel region. The `copy_` can just be moved outside of the parallel region, but avoiding the `select` call is more complicated because it's needed to construct the `TensorIterator`. Instead, I factor out a `serial_for_each` free-function that just takes pointers and strides. Then manually advance the pointer to the thread-specific slice of data. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28735330 Pulled By: ngimel fbshipit-source-id: 8e096eb5801af9381ebd305e3ae7796a79b86298
Author
Parents
Loading