pytorch
7b50e762 - optimize cat performance on CPU with TensorIterator (#30806)

Commit
4 years ago
optimize cat performance on CPU with TensorIterator (#30806) Summary: This PR aims at improving `cat` performance on CPU. Current `cat` logic from `TH` module has no parallelization when the input tensor array are all contiguous. This code also try to reuse the same `TensorIterator` as much as possible, in order to reduce overhead of creating `TensorIterator`, this is helpful when the slice of copy is not large enough. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30806 Differential Revision: D19275026 Pulled By: VitalyFedyunin fbshipit-source-id: 756e9b86891f725c256b0a6981887ff06d88b053
Author
Parents
Loading