DeepSpeed
5a721de3 - Stop tracking backward chain of broadcast in initialization (#5075)

Commit

2 years ago

Stop tracking backward chain of broadcast in initialization (#5075) DeepSpeed engine generates the following warning upon initialization. This warning is triggered by a broadcast that synchronizes model parameters across ranks. Although this is harmless in terms of both accuracy and, likely, performance, it may confuse users and potentially cause compatibility issues with future versions of PyTorch. This PR runs the broadcast within a `torch.no_grad` context to prevent tracking of the backward computation chain. ``` /home/aiscuser/.conda/envs/wbcast/lib/python3.9/site-packages/torch/autograd/__init__.py:266: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /opt/conda/conda-bld/pytorch_1704987277512/work/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass ``` Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

References

#5075 - Stop tracking backward chain of broadcast in initialization

Author

tohtana

Parents

f02d7bda

DeepSpeed 5a721de3 - Stop tracking backward chain of broadcast in initialization (#5075)

DeepSpeed
5a721de3 - Stop tracking backward chain of broadcast in initialization (#5075)