pytorch
4207d3c3 - `FusedAdam(W)` should take `OptState` into account before unscaling grads (#94060)

Commit

1 year ago

`FusedAdam(W)` should take `OptState` into account before unscaling grads (#94060) the optimizers have to consult `OptState` before unscaling gradients because we could call `GradScaler.unscale_` explicitly to for e.g. `clip_grad_norm_` as mentioned in https://github.com/pytorch/pytorch/blob/e52786f3d177a7ca5d490a516cf52e236ef072cb/torch/cuda/amp/grad_scaler.py#L235-L266 and https://pytorch.org/docs/stable/notes/amp_examples.html#working-with-unscaled-gradients Related #90752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94060 Approved by: https://github.com/albanD

Author

crcrpar

Committer

pytorchmergebot

Parents

adde6fd2

pytorch 4207d3c3 - `FusedAdam(W)` should take `OptState` into account before unscaling grads (#94060)

pytorch
4207d3c3 - `FusedAdam(W)` should take `OptState` into account before unscaling grads (#94060)