pytorch
7b50e762 - optimize cat performance on CPU with TensorIterator (#30806)

Commit

4 years ago

optimize cat performance on CPU with TensorIterator (#30806) Summary: This PR aims at improving `cat` performance on CPU. Current `cat` logic from `TH` module has no parallelization when the input tensor array are all contiguous. This code also try to reuse the same `TensorIterator` as much as possible, in order to reduce overhead of creating `TensorIterator`, this is helpful when the slice of copy is not large enough. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30806 Differential Revision: D19275026 Pulled By: VitalyFedyunin fbshipit-source-id: 756e9b86891f725c256b0a6981887ff06d88b053

Author

mingfeima

Committer

facebook-github-bot

Parents

ad90c97c

pytorch 7b50e762 - optimize cat performance on CPU with TensorIterator (#30806)

pytorch
7b50e762 - optimize cat performance on CPU with TensorIterator (#30806)