Enable dynamic shapes for pipeline parallel engine inputs (#5481)
This PR enables dynamic shapes for inputs to pipeline parallel (PP)
engine.
Currently PP engine checks tensor shapes and allocates communication
buffer at the first forward/backward passes. This causes a tensor shape
mismatch error when input tensor shapes changed.
This PR adds an option to check tensor shapes at every iteration and
allocate buffer based on the shapes. As shown below, you can enable this
feature by passing `dynamic_shape=True` to `PipelineModule`.
Note that this might have a performance impact and the option is set to
False as default.
```python
model = PipelineModule(
...
dynamic_shape=True
)
```
This will increase the overhead of buffer allocation and communication
for tensor metadata. To mitigate the overhead, this PR also includes
these improvements:
- Consolidate multiple communication calls to send/recv tensor shapes
9f96ad4049b1fb63d38a1a090480dbef61dc0490
- Reuse (extend) communication buffer instead of creating a new one
b3c07504be05b08772f6297a864ec6c27b5eeca3
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>