pytorch
174e1ba3 - Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type (#27457)

Commit

5 years ago

Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type (#27457) Summary: 1) Short-circuits computing common type and type promotion logic for the common case of operands and result of the same type 2) Improves performance of checking memory overlap by returning MemoryOverlap::FULL if tensors are the same, skips the call from TensorIterator when tensors are the same 3) Changes the default size of DimVector from 5 to 6, thus allowing it not to be resized for a common case of binary operation. `strides` DimVector is forced to have at least 2*num_tensors elements, which for an operation with 2 inputs and one output is 6 4) If `offset` is 0 (common non-broadcasting case), don't fill `strides` vector with 0-s, because all the values will be subsequently written to. These changes combined improve the overhead from 1.02 us to .74 us for a simple in-place operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27457 Test Plan: should be covered by existing tests Differential Revision: D17784532 Pulled By: ngimel fbshipit-source-id: e6a8ee58be5de14461bdbc2e2b0b6d16a96c309f

Author

Natalia Gimelshein

Committer

facebook-github-bot

Parents

3ac42677

pytorch 174e1ba3 - Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type (#27457)

pytorch
174e1ba3 - Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type (#27457)