pytorch
d0d673b0 - Improve reciprocal() and rsqrt() accuracy on arm64 (#47478)

Commit
4 years ago
Improve reciprocal() and rsqrt() accuracy on arm64 (#47478) Summary: Neither `vrecpeq_f32` nor `vrsqrteq_f32` yield accurate results but just perform first of two steps in an iteration of the Newton-Raphson method, as documented at https://developer.arm.com/documentation/dui0472/j/using-neon-support/neon-intrinsics-for-reciprocal-and-sqrt Use appropriate NEON instruction to run two more steps of the Newton's method to improve results Before: ``` $ python -c "import torch;print(torch.arange(1.0, 17.0, 1.0, dtype=torch.float32).reciprocal())" tensor([0.9980, 0.4990, 0.3330, 0.2495, 0.1997, 0.1665, 0.1426, 0.1248, 0.1108, 0.0999, 0.0908, 0.0833, 0.0769, 0.0713, 0.0667, 0.0624]) $ python -c "import torch;print(torch.arange(1.0, 17.0, 1.0, dtype=torch.float32).rsqrt())" tensor([0.9980, 0.7051, 0.5762, 0.4990, 0.4463, 0.4082, 0.3779, 0.3525, 0.3330, 0.3154, 0.3008, 0.2881, 0.2773, 0.2666, 0.2578, 0.2495]) ``` After: ``` $ python -c "import torch;print(torch.arange(1.0, 17.0, 1.0, dtype=torch.float32).reciprocal())" tensor([1.0000, 0.5000, 0.3333, 0.2500, 0.2000, 0.1667, 0.1429, 0.1250, 0.1111, 0.1000, 0.0909, 0.0833, 0.0769, 0.0714, 0.0667, 0.0625]) $ python -c "import torch;print(torch.arange(1.0, 17.0, 1.0, dtype=torch.float32).rsqrt())" tensor([1.0000, 0.7071, 0.5774, 0.5000, 0.4472, 0.4082, 0.3780, 0.3536, 0.3333, 0.3162, 0.3015, 0.2887, 0.2774, 0.2673, 0.2582, 0.2500]) ``` Partially addresses https://github.com/pytorch/pytorch/issues/47476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47478 Reviewed By: walterddr Differential Revision: D24773443 Pulled By: malfet fbshipit-source-id: 224dca9725601d29fb229f8d71d968a30f25c829
Author
Parents
Loading