changed launch bounds for upsample_linear1d fwd, bwd from 1024 to 512 (#61307)
Summary:
Changed launch bounds for upsample_linear1d_out_frame and upsample_linear1d_backward_out_frame from 1024 to 512. Shows performance improvement as shown below. Does not completely eliminate lmem usage (lmem usage goes from 40-48 bytes to 8-16 bytes), not sure why.
Timing data (using Nvidia Titan-V GPU):
![UpsampleLinear1dTimingData](https://user-images.githubusercontent.com/22803332/124677708-e20d6280-de75-11eb-8187-fb50ec89dc50.PNG)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61307
Reviewed By: heitorschueroff
Differential Revision: D29662137
Pulled By: ngimel
fbshipit-source-id: 9653672ee17f25b75a02f295f388a78327091431