pytorch
a8429342 - fix mul/div overflow issue on CPU float16 (#98820)

Commit
1 year ago
fix mul/div overflow issue on CPU float16 (#98820) Fix https://github.com/pytorch/pytorch/issues/63482 and https://github.com/pytorch/pytorch/issues/98691 The above two issues have the same root cause: **binary_ops** will create TensorIterator with the flag `promote_inputs_to_common_dtype` on, which will convert both input tensors to the common_dtype_ (the logic is bypassed on CUDA), which might overflow on Half. If one of the inputs is a scalar with abs value larger than ~65000, it will overflow. This patch will try to fetch the scalar value from the `original_tensor_base` which records the original scalar input value, then in the `cpu_kernel_vec` the TensorIterator is treated as an unary Op. So previously, CPU and CUDA would have different behaviors for such scenario. This is aligned with this patch, test cases added for both CPU and CUDA device. The following is the results: #### before: ``` >>> torch.tensor([3388.], dtype=torch.half).div(524288.0) tensor([0.], dtype=torch.float16) >>> torch.tensor([0.01], dtype=torch.float16) * torch.tensor(65536, dtype=torch.float32) tensor([inf], dtype=torch.float16) ``` #### after: ``` >>> torch.tensor([3388.], dtype=torch.half).div(524288.0) tensor([0.0065], dtype=torch.float16) >>> torch.tensor([0.01], dtype=torch.float16) * torch.tensor(65536, dtype=torch.float32) tensor([655.5000], dtype=torch.float16) ``` Also need to update `RRelu` implementation, to use float to store the intermediate results, otherwise the following test case would fail: ``` . build/bin/test_api --gtest_filter=ModulesTest.RReLU ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/98820 Approved by: https://github.com/jgong5, https://github.com/ngimel
Author
Committer
Parents
Loading