pytorch
bcf8752f - updated launch bounds for trilinear 3d (#59999)

Commit
3 years ago
updated launch bounds for trilinear 3d (#59999) Summary: Updates launch bounds for upsample_trilinear_3d forward and backward kernel to remove register spilling into local memory. Improves runtime for forward pass by 3-4x factor, backward pass has same runtime (probably different bottleneck). Timing data: (Using Nvidia Titan-V GPU) ![TrilinearTimingData](https://user-images.githubusercontent.com/22803332/121979658-72f19200-cd3f-11eb-9363-c00e2c4eea6d.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59999 Reviewed By: zou3519 Differential Revision: D29185976 Pulled By: ngimel fbshipit-source-id: 0b2313e70e45c53938cd7262464d3aa4fab8da4a
Author
Parents
Loading