pytorch
ceddcf54 - istft: Use unfold_backward instead of col2im (#88060)

Commit
3 years ago
istft: Use unfold_backward instead of col2im (#88060) `unfold_backward` implements the same operation as `col2im` but without support for 2d kernels or dilation. However, `istft` doesn't use any of those features and `unfold_backward` actually has a faster `TensorIterator` based implementation so we should use it here instead. In the example from #87353 I see a 2x speedup on both CPU and CUDA. On a wider variety of sizes and inputs I still see speedups across the board, especially on CPU since `col2im` isn't parallelized but `unfold_backward` is: | device | shape | hop_length | Master (us) | This PR (us) | Speedup | |--------|-----------------|------------|-------------|--------------|---------| | CUDA | (1, 129, 33) | 256 | 147 | 136 | 1.08 | | | | 128 | 153 | 128 | 1.20 | | | (100, 129, 20) | 256 | 181 | 147 | 1.23 | | | | 128 | 171 | 137 | 1.25 | | | (1000, 129, 10) | 256 | 681 | 443 | 1.55 | | | | 128 | 632 | 446 | 1.42 | | CPU | (1, 129, 33) | 256 | 106 | 104 | 1.02 | | | | 128 | 103 | 81 | 1.27 | | | (100, 129, 20) | 256 | 2400 | 399 | 6.02 | | | | 128 | 2150 | 313 | 6.87 | | | (1000, 129, 10) | 256 | 13800 | 3740 | 3.69 | | | | 128 | 12700 | 2110 | 6.02 | Pull Request resolved: https://github.com/pytorch/pytorch/pull/88060 Approved by: https://github.com/albanD
Author
Committer
Parents
Loading