pytorch
e4be80c1 - simplify cpu_kernel to not have contiguous special case (#58830)

Commit
3 years ago
simplify cpu_kernel to not have contiguous special case (#58830) Summary: Per title `unroll_contiguous_scalar_checks` tries to verify that all arguments (including outputs) are contiguous except maybe 1 scalar (with stride 0). Then it calls the passed lambda with index of the scalar arg if this verification succeeded, or 0 if args were not contiguous/there was no scalar. Depending on the value of this index (with 0=not found) a different function can be called (in vectorized kernels it’s vectorized loop if args are contiguous + scalar, and basic loop if not). It makes sense for vectorized kernel (vectorized loop can still be used in some broadcasted cases), but all other (cpu_kernel, serial_cpu_kernel, cpu_kernel_multiple_outputs) don’t even use idx argument in lambda, so regardless of what `unroll_contiguous_scalar_checks` does, they'll do the same thing. No point in calling `unroll_contiguous_scalar_checks` then. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58830 Reviewed By: zou3519, mruberry Differential Revision: D28632668 Pulled By: ngimel fbshipit-source-id: c6db3675933184e17cc249351c4f170b45d28865
Author
Natalia Gimelshein
Parents
Loading