DDP: Get static graph to print unused parameters in debug mode. (#81929)
Fixes #68833
Test Toy Model:
```
class ToyModel(nn.Module):
def __init__(self):
super(ToyModel, self).__init__()
# net1, bias are all unused params.
self.net1 = nn.Linear(10, 5, bias=False)
self.bias = nn.Parameter(torch.zeros(5))
self.net2 = nn.Linear(10, 5)
def forward(self, x):
return self.net2(x).sum()
```
With this PR, the following output can be observed (using 4 Ranks, and with export TORCH_CPP_LOG_LEVEL=0 and export TORCH_DISTRIBUTED_DEBUG=INFO (or export TORCH_DISTRIBUTED_DEBUG=DETAIL), torch.nn.parallel.DistributedDataParallel with static_graph set to true):
[I reducer.cpp:578] [Rank 0]: Parameter(s) (in the format of {param_name, index}): {.bias,0}{net1.weight,1} is(are) unused
during first iteration. Since static_graph=True is enabled for DDP, we expect this set of unused parameters to remain consi
stent on this rank throughout the training.
[I reducer.cpp:578] [Rank 3]: Parameter(s) (in the format of {param_name, index}): {.bias,0}{net1.weight,1} is(are) unused
during first iteration. Since static_graph=True is enabled for DDP, we expect this set of unused parameters to remain consi
stent on this rank throughout the training.
[I reducer.cpp:578] [Rank 2]: Parameter(s) (in the format of {param_name, index}): {.bias,0}{net1.weight,1} is(are) unused
during first iteration. Since static_graph=True is enabled for DDP, we expect this set of unused parameters to remain consi
stent on this rank throughout the training.
[I reducer.cpp:578] [Rank 1]: Parameter(s) (in the format of {param_name, index}): {.bias,0}{net1.weight,1} is(are) unused
during first iteration. Since static_graph=True is enabled for DDP, we expect this set of unused parameters to remain consi
stent on this rank throughout the training.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81929
Approved by: https://github.com/rohan-varma, https://github.com/malfet