pytorch
28dc02fe - Accumulate 16-bit float sums in 32-bit accumulators (#60387)

Commit
4 years ago
Accumulate 16-bit float sums in 32-bit accumulators (#60387) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60387 Fixes gh-59489 Using 32-bit accumulators is a win-win: improved precision and improved performance since the half precision types needed to be converted back and forth to 32-bit float to do the arithmetic anyway. Note that on multi-threaded or dis-contiguous sums, there can be partial sums stored in the output so they are necessarily trucated to 16-bit. Fixing this would require a rework of TensorIterator reductions. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D29447187 Pulled By: ngimel fbshipit-source-id: d0619e0ca2fe116d101460142b79ca56fd6d0840
Author
Parents
Loading