pytorch
e6a8d15a - cpu_kernel_vec: Hoist stride checks out of loop (#68962)

Commit View On GitHub

Commit

2 years ago

cpu_kernel_vec: Hoist stride checks out of loop (#68962) Summary: `cpu_kernel_vec` does stride checks to determine whether to use the vectorized or scalar inner loop. Since it uses a 1d `for_each` loop, it re-does these stride checks after every loop over the inner dimension. For iterators with small inner dimensions, this means a significant proportion of the time may be spent just on stride checks. This changes it to use a 2d loop so the stride checks are further amortized. With the below `copy_` benchmark, it saves 50% of the callgrind instruction count from 28.4 Million to 13.5 Million and 30% time speedup from 22.8 us to 16.4 us on my machine. ``` from torch.utils.benchmark import Timer import timeit timer = Timer( stmt="b.copy_(a);", setup=""" auto a = at::rand({10000, 8}, at::kComplexDouble).slice(0, 0, -1, 2); auto b = at::empty_like(a); """, num_threads=1, language='c++', timer=timeit.default_timer ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68962 Reviewed By: mrshenli Differential Revision: D32684191 Pulled By: ngimel fbshipit-source-id: 582af038314a0f999f43669e66edace38ff8d2dc

Author

peterbell10

Committer

facebook-github-bot

Parents

61ea2fc3

pytorch e6a8d15a - cpu_kernel_vec: Hoist stride checks out of loop (#68962)

Commit

pytorch
e6a8d15a - cpu_kernel_vec: Hoist stride checks out of loop (#68962)