cuDNN convolution try multiple algo (#33073)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/31336 https://github.com/pytorch/pytorch/issues/1664
Sometimes cuDNN heuristics return algorithms that can not be used. Instead of just using the first algorithm returned, we should try these algorithms one by one until one of them succeed.
Benchmark:
https://github.com/zasdfgbnm/things/blob/master/2020Q1/conv-benchmark.ipynb
```python
i = torch.randn(256, 3, 256, 256).cuda()
c = torch.nn.Conv2d(3, 3, 3, 3).cuda()
%timeit c(i); torch.cuda.synchronize()
```
before vs after = 498 vs 490 µs
The performance is improved I guess because, before this PR, we always call the heuristics to get the algorithm, but after this PR, we only do at the first time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33073
Differential Revision: D20284755
Pulled By: ngimel
fbshipit-source-id: b03af37c75939ca50c2cb401c706ba26914dd10e