pytorch
fe90313d - Avoid index_put_ overhead in histogram kernel's inner loop (#67815)

Commit View On GitHub

Commit

2 years ago

Avoid index_put_ overhead in histogram kernel's inner loop (#67815) Summary: **TLDR**: Makes torch.histc run 400x faster on large inputs. Should fix [a broken test on internal CI](https://www.internalfb.com/intern/test/281475013640093/). HistogramKernel presently calls torch.Tensor.index_put_ once for each element of its input tensor. Obtaining a data pointer and manipulating it directly avoids the considerable dispatch overhead from calling index_put_. Behavior is unchanged because the tensor being operated on is known to be contiguous and in CPU memory. Fixes performance regression introduced in https://github.com/pytorch/pytorch/pull/65318. Benchmark: time taken to compute histc on a tensor with 10,000,000 elements 1. Before https://github.com/pytorch/pytorch/pull/65318: **0.003s** 2. After https://github.com/pytorch/pytorch/pull/65318: **2.154s** 3. After this change: **0.005s** Benchmark code: ``` import torch as t from timeit import default_timer as timer x = t.randperm(10000000, dtype=t.float32) start = timer() t.histc(x) end = timer() print(end - start) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67815 Reviewed By: anjali411 Differential Revision: D32357663 Pulled By: saketh-are fbshipit-source-id: f8fa59173ea4772c8ad1332548ef4d9ea8f01178

Author

saketh-are

Committer

facebook-github-bot

Parents

61a94495

pytorch fe90313d - Avoid index_put_ overhead in histogram kernel's inner loop (#67815)

Commit

pytorch
fe90313d - Avoid index_put_ overhead in histogram kernel's inner loop (#67815)