Stop tracking backward chain of broadcast (ZeRO3) (#5113)
The broadcast that ZeRO3 uses on initialization displays a warning shown
below. This PR avoids this by passing `.data` to the broadcast.
The same issue of ZeRO 1/2 was addressed in #5075 using `torch.no_grad`,
which affects multiple lines in the scope. This PR also changes the fix
for ZeRO 1/2 to passing `.data` to broadcast for consistency and safety.
```
/home/mtanaka/.conda/envs/tcomp/lib/python3.9/site-packages/torch/autograd/graph.py:681: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at /opt/conda/conda-bld/pytorch_1704786093577/work/torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
```