pytorch
e2ddaec5 - Reverting launch bounds change in topK that induced a regression in perf (#63431)

Commit
3 years ago
Reverting launch bounds change in topK that induced a regression in perf (#63431) Summary: [topkwsyncs.zip](https://github.com/pytorch/pytorch/files/7003077/topkwsyncs.zip) Running this script on nvidia containers 21.08 vs 21.07 we see the following perf drops: topk(input=(dtype=torch.float16,shape=[60, 201600]), k=2000, dim=1, sorted=True) - 0.63 topk(input=(dtype=torch.float32,shape=[120000]), k=12000, dim=0, sorted=False) - 0.55 topk(input=(dtype=torch.float16,shape=[5, 201600]), k=2000, dim=1, sorted=True) - 0.55 topk(input=(dtype=torch.float32,shape=[1, 10000]), k=1000, dim=1, sorted=False) - 0.33 The relative perf drop is reported as (21.08_time - 21.07_time) / 21.07_time I narrowed down the source of the regression to this commit: https://github.com/pytorch/pytorch/pull/60314 which reduced launch bounds from 1024 to 512. The perf did not seem to regress in the original evidence provided to change 1024 to 512 due to the input shapes in the benchmark being a lot smaller than the input shapes of the tensors which I am witnessing perf regression in. I suggest reverting back to 1024 as with 512 there was no considerable improvement in perf for small inputs and a major regression in perf for large tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63431 Reviewed By: mruberry Differential Revision: D30384087 Pulled By: ngimel fbshipit-source-id: 11eecbba82a069b1d4579d674c3f644ab8060ad2
Author
Parents
Loading