pytorch
e1bddbba - Bounds checking for functor execution in vectorized/unrolled kernels (#33642)

Commit View On GitHub

Commit

4 years ago

Bounds checking for functor execution in vectorized/unrolled kernels (#33642) Summary: The current logic for vectorized/unrolled operations in CUDALoops.cuh applies bounds checking to loads and stores, [but not to the actual functor's execution](https://github.com/pytorch/pytorch/blob/16d6c17845426294274850f9161e292345f2afa5/aten/src/ATen/native/cuda/CUDALoops.cuh#L264). In other words, for a block acting on the tail of a tensor that doesn't require the whole block to participate in memory transactions, many threads execute their functor on uninitialized data. For functors that only communicate with the outside world via the bounds-checked loads and stores, that's ok. The threads acting on garbage data never actually write their results. But [my proposed inf/nan checking kernel](https://github.com/pytorch/pytorch/pull/33366/files#diff-9701a2b34900195d160bdc234e001b79R70-R79) has the additional side effect of writing to a `found_inf` flag in global memory. For irregularly-shaped tensors where tail threads execute the functor on garbage data, these threads would sometimes see and report spurious infs/nans. In general, we can't guarantee functors won't have side effects. For safety (and efficiency) we should apply bounds checking to the functor execution as well as the loads and stores. Is it possible that other elementwise kernels (in addition to the strided/vectorized implementation) are also executing functors unconditionally? That would cause similar failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33642 Differential Revision: D20062985 Pulled By: ngimel fbshipit-source-id: 65b8d75a001ce57865ed1c0cf89105d33f3f4dd4

Author

definitelynotmcarilli

Committer

facebook-github-bot

Parents

941b4242

pytorch e1bddbba - Bounds checking for functor execution in vectorized/unrolled kernels (#33642)

Commit

pytorch
e1bddbba - Bounds checking for functor execution in vectorized/unrolled kernels (#33642)