pytorch
1716709d - [CUDA] Use accumulate type to improve accuracy of grid_sample on half precision inputs [v2] (#96586)

Commit
1 year ago
[CUDA] Use accumulate type to improve accuracy of grid_sample on half precision inputs [v2] (#96586) Fixes #96429 This PR is also a follow up for #90427. In that PR, we also discussed whether calculations of grid indices `grid_sampler_compute_source_index` should also be upcasted to `opmath_t` https://github.com/pytorch/pytorch/pull/90427/files#r1048876708. Due to another unit test failure, we didn't upcast those calculations in that PR. After some investigations, I found that the inaccurate results have nothing to do with the internals of `affine_grid`, even if it's calculated using `double` internally. As long as input `grid` is passed to `grid_sample` in **half** precision, the results will be less inaccurate than a **float** `grid`. This can be verified with a short C++ program like this (by setting `TYPE_T` to `__half` and `float` in compilations) ```cpp #include <cuda.h> #include <cuda_runtime.h> #include <cuda_fp16.h> #include <iostream> #ifndef TYPE_T #define TYPE_T float #endif int main() { using type_t = TYPE_T; type_t d = static_cast<__half>((double)2.0 / 3.0); type_t s = (((float)d + 1.f) * 3 - 1) / 2; printf("%.15f %.15f\n", (double)d, (double)s); } ``` Outputs are ``` ./float.out 0.666503906250000 1.999755859375000 ./half.out 0.666503906250000 2.000000000000000 ``` To resolve the discussion back in https://github.com/pytorch/pytorch/pull/90427/files#r1048876708, I've also increased the test tolerance in the failed unit test `issue_24823_1(torch.half)`. For the original script in #96429, I got more accurate results with `align_corners = True` ``` align_corners = True Expected result has mean absolute value of 0.5285 and maximum absolute value of 3.2067. Half precision result is off by 0.0001 (0.02%) on average and 0.0010 (0.03%) at maximum. align_corners = False Expected result has mean absolute value of 0.5189 and maximum absolute value of 3.0101. Half precision result is off by 0.0001 (0.02%) on average and 0.0010 (0.03%) at maximum. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96586 Approved by: https://github.com/ngimel
Author
Committer
Parents
Loading