SemanticDiff

pytorch
156954d6 - [Inductor] Add support for NEON ISA in the Inductor C++ backend (#105590)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

212 days ago

[Inductor] Add support for NEON ISA in the Inductor C++ backend (#105590) Fixes #104729 As suggested in the [blog](https://dev-discuss.pytorch.org/t/torchinductor-update-5-cpu-backend-backend-performance-update-and-deep-dive-on-key-optimizations/1117#:~:text=It%20can%20be,sub%2Dclasses.), I subclassed the `VecISA` class and implemented a NEON version of the `vec_reduce_all()` function, to go along with the existing AVX2 and AVX512 versions. Any operation that calls `vec_reduce_all()` will also take the NEON path and benefit from its vectorization. The `vec_reduce_all()` is invoked by Softmax and other operations like norms. Using the fast path results in 30% time savings for Softmax as compared to the previously taken slow path. | Slow path | Fast path (NEON intrinsics) -- | -- | -- Softmax (100 passes, 1024 dimension) | 623.706ms | 452.011ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/105590 Approved by: https://github.com/jgong5, https://github.com/malfet

Author

Rohanjames1997

Rohanjames1997

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading