pytorch
a3346e10 - Performance improvements for depthwise convolutions in FP16 (#22302)

Commit View On GitHub

Commit

5 years ago

Performance improvements for depthwise convolutions in FP16 (#22302) Summary: This PR activates faster depthwise convolution kernels for Volta and Turing GPUs using cudnn >= 7600. The script to benchmark the current PyTorch master branch and this PR branch can be found [here](https://gist.github.com/ptrblck/4590cf20721d8f43296c9903abd4a774). (50 warmup iterations, 1000 iterations for timing) I've used https://github.com/pytorch/pytorch/issues/3265 to create a similar benchmark and added a few additional setups. Since the results are quite long, I've uploaded them in a spreadsheet [here](https://docs.google.com/spreadsheets/d/13ByXcqg7LQUr3DVG3XpLwnJ-CXg3GUZJ3puyTMw9n2I/edit?usp=sharing). Times are given in ms per iteration. We've benchmarked this PR on a DGX1 using V100 GPUs. The current workload check in `check_cudnn_depthwise_workload` is quite long and can be moved to another file, if wanted. CC ngimel (Thanks for the support while benchmarking it ;) ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/22302 Differential Revision: D16115057 Pulled By: ezyang fbshipit-source-id: bad184658518e73b4d6b849d77e408f5a7a757de

Author

ptrblck

Committer

facebook-github-bot

Parents

31d821e2

pytorch a3346e10 - Performance improvements for depthwise convolutions in FP16 (#22302)

Commit

pytorch
a3346e10 - Performance improvements for depthwise convolutions in FP16 (#22302)