DeepSpeed
185330cd - Support complicated use cases with TiedLayerSpec (#7208)

Commit

283 days ago

Support complicated use cases with TiedLayerSpec (#7208) I want to reuse a composed module in the pipeline. For example, the following `MyModule` has a member `linear`, which is also a module. ```python class MyModule(torch.nn.Module): def __init__(self, n_in: int, n_out: int): super().__init__() self.linear = torch.nn.Linear(n_in, n_out) self.layer_norm = torch.nn.LayerNorm(n_out) def forward(self, data: torch.Tensor) -> torch.Tensor: hidden = self.linear(data) hidden = self.layer_norm(hidden) return hidden ``` `MyModule.linear.weight` should be synchronized among related ranks. As a result, I add `linear.weight` to `TiedLayerSpec.tied_weight_attr`. BTW, I generate the whole `tied_weight_attr` by the following instruction. ```python tied_weight_attr = [name for name, p in layer.named_parameters() if p.numel() > 1] ``` However, the builtin `getattr` used by `PipelineModule` fails to find a nested attribute like `linear.weight`. Hence, this PR first extends the builtin `getattr` to a recursive version `PipelineModule._recursive_getattr`, accessing each attribute segment one by one. Meanwhile, the order of tied weights matters in synchronization. This PR suggests to sort tie_keys in `PipelineModule._index_tied_modules` to avoid hanging. Signed-off-by: Mingjie Li <limingjie@chinamobile.com> Co-authored-by: Mingjie Li <limingjie@chinamobile.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>

References

#7208 - Support complicated use cases with TiedLayerSpec

Author

limjcst

Parents

56005d2b

DeepSpeed 185330cd - Support complicated use cases with TiedLayerSpec (#7208)

DeepSpeed
185330cd - Support complicated use cases with TiedLayerSpec (#7208)