DeepSpeed
f295aea0 - Stop tracking backward chain of broadcast (ZeRO3) (#5113)

Commit

2 years ago

Stop tracking backward chain of broadcast (ZeRO3) (#5113) The broadcast that ZeRO3 uses on initialization displays a warning shown below. This PR avoids this by passing `.data` to the broadcast. The same issue of ZeRO 1/2 was addressed in #5075 using `torch.no_grad`, which affects multiple lines in the scope. This PR also changes the fix for ZeRO 1/2 to passing `.data` to broadcast for consistency and safety. ``` /home/mtanaka/.conda/envs/tcomp/lib/python3.9/site-packages/torch/autograd/graph.py:681: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /opt/conda/conda-bld/pytorch_1704786093577/work/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) ```

References

#5113 - Stop tracking backward chain of broadcast (ZeRO3)

Author

tohtana

Parents

41bc9fee

DeepSpeed f295aea0 - Stop tracking backward chain of broadcast (ZeRO3) (#5113)

DeepSpeed
f295aea0 - Stop tracking backward chain of broadcast (ZeRO3) (#5113)