DeepSpeed
1ab1928d - Enable dynamic shapes for pipeline parallel engine inputs (#5481)

Comment changes are shownComment changes are hidden
Commit
316 days ago
Enable dynamic shapes for pipeline parallel engine inputs (#5481) This PR enables dynamic shapes for inputs to pipeline parallel (PP) engine. Currently PP engine checks tensor shapes and allocates communication buffer at the first forward/backward passes. This causes a tensor shape mismatch error when input tensor shapes changed. This PR adds an option to check tensor shapes at every iteration and allocate buffer based on the shapes. As shown below, you can enable this feature by passing `dynamic_shape=True` to `PipelineModule`. Note that this might have a performance impact and the option is set to False as default. ```python model = PipelineModule( ... dynamic_shape=True ) ``` This will increase the overhead of buffer allocation and communication for tensor metadata. To mitigate the overhead, this PR also includes these improvements: - Consolidate multiple communication calls to send/recv tensor shapes 9f96ad4049b1fb63d38a1a090480dbef61dc0490 - Reuse (extend) communication buffer instead of creating a new one b3c07504be05b08772f6297a864ec6c27b5eeca3 --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Author
Parents
  • deepspeed/runtime/pipe
    • File
      engine.py
    • File
      module.py
  • tests/unit
    • File
      alexnet_model.py
    • runtime/pipe
      • File
        test_pipe.py
    • File
      util.py