Support custom partitioning patterns for AutoTP (#7806)
This PR introduces a flexible, configuration-driven API for AutoTP
(Automatic Tensor Parallelism) that allows users to define custom layer
partitioning patterns for training.
@inkcherry @delock
## Motivation
Previously, AutoTP relied on hardcoded layer detection logic that was
difficult to customize for new model architectures. This PR enables:
1. **Custom models**: Users can define exact regex patterns to match
their model's parameter names
2. **Fused layers**: Support for fused QKV, gate_up_proj, and other
packed weight matrices with unequal sub-parameter sizes (e.g., GQA with
different Q/K/V dimensions)
3. **Extensibility**: Easy to add new model presets or customize
existing ones
Here is an example of a config including custom partitioning patterns:
```json
{
"tensor_parallel": {
"autotp_size": 4,
"partition_config": {
"use_default_specs": false,
"layer_specs": [
{
"patterns": [".*\\.o_proj\\.weight$", ".*\\.down_proj\\.weight$"],
"partition_type": "row"
},
{
"patterns": [".*\\.[qkv]_proj\\.weight$"],
"partition_type": "column"
},
{
"patterns": [".*\\.gate_up_proj\\.weight$"],
"partition_type": "column",
"shape": [2, -1],
"partition_dim": 0
}
]
}
}
}
```
Refer to the
[document](https://github.com/tohtana/DeepSpeed/blob/tohtana/autotp_custom_patterns/docs/code-docs/source/training.rst)
for more details (including preset models and how to define partitioning
for fused models).
We also opened a new
[PR](https://github.com/deepspeedai/DeepSpeedExamples/pull/998) to show
the usage.
## Simplified initialization step
AutoTP previously required calling ``set_autotp_mode(training=True)``
and ``deepspeed.tp_model_init`` before ``deepspeed.initialize``. Now we
can include all the necessary configurations in the DeepSpeed config.
We still support the traditional initialization path for backward
compatibility.
When you use both (i.e. calling ``set_autotp_mode(training=True)`` and
``deepspeed.tp_model_init`` and passing the config to
``deepspeed.initialize``), we will merge the settings at initialization.
When we have conflicting settings, we will error out.
---------
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>