DeepSpeed
2bdf061f - [BUG] partition_balanced return wrong result. (#4312)

Commit
2 years ago
[BUG] partition_balanced return wrong result. (#4312) # Background In pipeline parallelism, deepspeed uses `ds_utils.partition_balanced` to balance the partitioning of the model according to the number of parameters or class names. https://github.com/microsoft/DeepSpeed/blob/581e44dd1ab3c409a5905335867c761d5cb4db5b/deepspeed/runtime/pipe/module.py#L380-L395 # What wrong? ``` >>> import deepspeed >>> deepspeed.__version__ '0.10.3+542dc0d5' >>> from deepspeed.runtime import utils as ds_utils >>> ds_utils.partition_balanced([1, 1, 1, 1, 1], 4) [0, 2, 4, 5, 5] >>> ``` the result [0, 2, 4, 5, 5] means [2, 2, 1, 0] layers for each part, which is not balanced at all. the last part will throw an exception because there are no parameters to training. i add some unit test for this function, and i will fix it later if anyone need it. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
Loading