SemanticDiff

pytorch
80144d9c - Implement NEON accelerated implementation of ERF() (#105610)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

1 year ago

Implement NEON accelerated implementation of ERF() (#105610) Fixes #105493 Inspired by the [AVX implementation](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cpu/vec/vec256/vec256_float.h#L158-L189) for the same. Perf on a Graviton3 EC2 instance with one OMP thread: Operation | std math | SLEEF | NEON (this PR) -- | -- | -- | -- GELU (100 passes) | 1141.897ms | 598.929ms | 515.499ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/105610 Approved by: https://github.com/jgong5

Author

Rohanjames1997

Rohanjames1997

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading