UpSample-nearest cuda kernel update (#21694)
Summary:
updating upsampling kernel:
1. avoids atomicAdd for better fp16 performance.
2. better launch configures for 2D input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21694
Differential Revision: D15875791
Pulled By: ezyang
fbshipit-source-id: 426fc5d5f0c0cdf58bfa1a2b564f17a9ea286fa4