pytorch
2435d941 - Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned (#44642)

Commit View On GitHub

Commit

4 years ago

Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned (#44642) Summary: For https://github.com/pytorch/pytorch/issues/44206 and https://github.com/pytorch/pytorch/issues/42218, I'd like to update trilinear interpolate backward and grid_sample backward to use `fastAtomicAdd`. As a prelude, I spotted a UB risk in `fastAtomicAdd`. I think existing code incurs a misaligned `__half2` atomicAdd when `index` is odd and `tensor` is not 32-bit aligned (`index % 2 == 1` and `(reinterpret_cast<std::uintptr_t>(tensor) % sizeof(__half2) == 1`). In this case we think we're `!low_bit` and go down the `!low_bit` code path, but in fact we are `low_bit`. It appears the original [fastAtomicAdd PR](https://github.com/pytorch/pytorch/pull/21879#discussion_r295040377)'s discussion did not consider that case explicitly. I wanted to push my tentative fix for discussion ASAP. jjsjann123 and mkolod as original authors of `fastAtomicAdd`. (I'm also curious why we need to `reinterpret_cast<std::uintptr_t>(tensor...` for the address modding, but that's minor.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/44642 Reviewed By: mruberry Differential Revision: D23699820 Pulled By: ngimel fbshipit-source-id: 0db57150715ebb45e6a1fb36897e46f00d61defd

Author

mcarilli

Committer

facebook-github-bot

Parents

2fd142a2

pytorch 2435d941 - Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned (#44642)

Commit

pytorch
2435d941 - Fix FP16 fastAtomicAdd for one case where tensor start address is not 32 bit aligned (#44642)