Support complicated use cases with TiedLayerSpec (#7208)
I want to reuse a composed module in the pipeline. For example, the
following `MyModule` has a member `linear`, which is also a module.
```python
class MyModule(torch.nn.Module):
def __init__(self, n_in: int, n_out: int):
super().__init__()
self.linear = torch.nn.Linear(n_in, n_out)
self.layer_norm = torch.nn.LayerNorm(n_out)
def forward(self, data: torch.Tensor) -> torch.Tensor:
hidden = self.linear(data)
hidden = self.layer_norm(hidden)
return hidden
```
`MyModule.linear.weight` should be synchronized among related ranks. As
a result, I add `linear.weight` to `TiedLayerSpec.tied_weight_attr`.
BTW, I generate the whole `tied_weight_attr` by the following
instruction.
```python
tied_weight_attr = [name for name, p in layer.named_parameters() if p.numel() > 1]
```
However, the builtin `getattr` used by `PipelineModule` fails to find a
nested attribute like `linear.weight`.
Hence, this PR first extends the builtin `getattr` to a recursive
version `PipelineModule._recursive_getattr`, accessing each attribute
segment one by one.
Meanwhile, the order of tied weights matters in synchronization. This PR
suggests to sort tie_keys in `PipelineModule._index_tied_modules` to
avoid hanging.
Signed-off-by: Mingjie Li <limingjie@chinamobile.com>
Co-authored-by: Mingjie Li <limingjie@chinamobile.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>