pytorch
a404cc9a - CUDA `addcmul` and `addcdiv` do math in float for 16 bits I/O (#60715)

Commit View On GitHub

Commit

3 years ago

CUDA `addcmul` and `addcdiv` do math in float for 16 bits I/O (#60715) Summary: Currently foreach `addcmul` and `addcdiv` cast scalar to float so that actual math is done in FP32 when tensor dtype is Float16/BFloat16 while regular `addcmul` and `addcdiv`, not. ### Reproducible steps to see the behavioral difference ```ipython In [1]: import torch; torch.__version__ Out[1]: '1.9.0' In [2]: a, b, c = torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([60000.0], device='cuda', dtype=torch.half), torch.tensor([-1.0], device='cuda', dtype=torch.half) In [4]: torch.addcmul(a, b, c, value=2) Out[4]: tensor([-inf], device='cuda:0', dtype=torch.float16) In [5]: torch._foreach_addcmul([a], [b], [c], value=2)[0] Out[5]: tensor([-60000.], device='cuda:0', dtype=torch.float16) ``` ### How foreach casts? Foreach addcmul and addcdiv cast scalar to `opmath_t` (almost equivalent to acc_type) here: https://github.com/pytorch/pytorch/blob/42c8439b6eaccf175cceaa820452583e2459a521/aten/src/ATen/native/cuda/ForeachPointwiseOp.cu#L30 and cast inputs and results here: https://github.com/pytorch/pytorch/blob/42c8439b6eaccf175cceaa820452583e2459a521/aten/src/ATen/native/cuda/ForeachFunctors.cuh#L133-L135 Related to https://github.com/pytorch/pytorch/issues/58833 #60227 https://github.com/pytorch/pytorch/issues/60454 cc ptrblck mcarilli ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/60715 Reviewed By: albanD Differential Revision: D29385715 Pulled By: ngimel fbshipit-source-id: 8bb2db19ab66fc99d686de056a6ee60f9f71d603

Author

crcrpar

Committer

facebook-github-bot

Parents

0be65cd5

pytorch a404cc9a - CUDA `addcmul` and `addcdiv` do math in float for 16 bits I/O (#60715)

Commit

pytorch
a404cc9a - CUDA `addcmul` and `addcdiv` do math in float for 16 bits I/O (#60715)