resubmit: [mta] APEX style Fused Adam (#81705) (#85507)
This PR implements an APEX style FusedAdam in PyTorch. This is different from the APEX one in that this is compatible with `torch.cuda.amp.GradScaler` by setting `_step_supports_amp_scaling` to `True` and unscales gradients inside its CUDA kernel.
related: https://github.com/pytorch/pytorch/issues/68041, https://github.com/pytorch/pytorch/issues/71274, https://github.com/pytorch/pytorch/issues/80167 possibly related to https://github.com/pytorch/pytorch/issues/80595#issuecomment-1178519436
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81705
Approved by: https://github.com/ngimel
cc @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85507
Approved by: https://github.com/ngimel