pytorch
895735c6 - TensorIterator: Avoid nesting two levels of function_ref in for_each (#53613)

Commit
4 years ago
TensorIterator: Avoid nesting two levels of function_ref in for_each (#53613) Summary: When calling `TensorIterator::for_each` with a 1d loop, it creates a `function_ref` for the 1D iteration, then wraps it with `LOOP_WRAPPER` to transform it into a 2d loop. That 2d loop then gets wrapped in another `function_ref`. This can result in significant overhead if the 1d inner loop is over a small number of elements. Instead, this wraps the 1d loop before type-erasure so only one level of `function_ref` is introduced. A simple benchmark demonstrates this is a win: ```python import torch a = torch.rand((10000, 2))[::2] %timeit a + a ``` Note the 2D tensor cannot be coalesced into 1D and both `cpu_kernel` and `cpu_kernel_vec` use 1D for_each. On master, this takes 42 us but with this change it's down to 32us. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53613 Reviewed By: VitalyFedyunin Differential Revision: D26947143 Pulled By: ezyang fbshipit-source-id: 5189ada0d82bbf74170fb446763753f02478abf6
Author
Parents
Loading