pytorch
f89ae9cb - Moves grid_sampler to autocast promote list (#58618)

Commit
4 years ago
Moves grid_sampler to autocast promote list (#58618) Summary: Should close https://github.com/pytorch/pytorch/issues/42218 Numerically, `grid_sampler` is fine in fp16 or fp32, but takes several inputs and expects their dtypes to match, so it belongs on the autocast promote list. `grid_sampler` currently uses `gpuAtomicAdd`, notoriously slow in fp16 because it calls cuda's atomicAdd __half overload which uses a software compare-and-swap loop internally. To allow good performance if both inputs happen to be FP16, the PR also modifies `grid_sampler_[2,3]d_backward_kernel`s to use `fastAtomicAdd` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58618 Reviewed By: mruberry Differential Revision: D29257199 Pulled By: ngimel fbshipit-source-id: 3cc7505945b480427f2fc1beb36bee80bf3853b3
Author
Michael Carilli
Parents
Loading