DeepSpeed
1ab1928d - Enable dynamic shapes for pipeline parallel engine inputs (#5481)

Commit

316 days ago

Enable dynamic shapes for pipeline parallel engine inputs (#5481) This PR enables dynamic shapes for inputs to pipeline parallel (PP) engine. Currently PP engine checks tensor shapes and allocates communication buffer at the first forward/backward passes. This causes a tensor shape mismatch error when input tensor shapes changed. This PR adds an option to check tensor shapes at every iteration and allocate buffer based on the shapes. As shown below, you can enable this feature by passing `dynamic_shape=True` to `PipelineModule`. Note that this might have a performance impact and the option is set to False as default. ```python model = PipelineModule( ... dynamic_shape=True ) ``` This will increase the overhead of buffer allocation and communication for tensor metadata. To mitigate the overhead, this PR also includes these improvements: - Consolidate multiple communication calls to send/recv tensor shapes 9f96ad4049b1fb63d38a1a090480dbef61dc0490 - Reuse (extend) communication buffer instead of creating a new one b3c07504be05b08772f6297a864ec6c27b5eeca3 --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

References

#5481 - Enable dynamic shapes for pipeline parallel engine inputs

Author

tohtana

Parents

4d4ff0ed

Files5

deepspeed/runtime/pipe
- engine.py
- module.py
tests/unit
- alexnet_model.py
- runtime/pipe
  - test_pipe.py
- util.py

DeepSpeed 1ab1928d - Enable dynamic shapes for pipeline parallel engine inputs (#5481)

DeepSpeed
1ab1928d - Enable dynamic shapes for pipeline parallel engine inputs (#5481)