DeepSpeed
a240c4da - Add HuggingFace tp_plan support for AutoTP (#7901)

Commit
16 days ago
Add HuggingFace tp_plan support for AutoTP (#7901) ## Summary Adds automatic detection and use of HuggingFace's built-in `base_model_tp_plan` for AutoTP, addressing the HuggingFace tp_plan support item from #7861. Models that ship with a `tp_plan` (e.g. Llama, Qwen, Gemma2) now work with AutoTP out of the box — no `preset_model` or `partition_config` needed, just set `autotp_size`. ## Changes **Runtime** - `engine.py`: Added tp_plan fallback in `_apply_autotp_partitioning`. Priority order: `partition_config` > HF `tp_plan` > AutoTP heuristics. - `config.py`: Added `_get_hf_tp_plan(model)` to extract tp_plan from `model._tp_plan` or `model.config.base_model_tp_plan`. - `tp_plan_converter.py`: New file. `TPPlanConverter` converts HF tp_plan entries (`colwise`/`rowwise`) to DeepSpeed `TPLayerSpec`. Other HF partition types (`colwise_rep`, `local_colwise`, etc.) are not yet supported (documented with TODO). **Tests** (11 files, 17 CPU + 5 GPU tests) - `test_tp_plan_converter.py`: Unit tests for the converter (alternate prefixes, projection names, unsupported types, etc.) - `test_tp_plan_extraction.py`: Unit tests for `_get_hf_tp_plan` with mock models. - `test_tp_plan_e2e.py`: GPU e2e tests with ZeRO 0/1/2 (requires 2 GPUs). - `test_tp_plan_real_models.py`: GPU tests with Qwen2 and custom models (requires 2 GPUs). **Documentation** - Tutorial: New "HuggingFace tp_plan Support" section in `autotp-training.md`. - Config reference: Added tp_plan paragraph in `config-json.md`. - API docs: Added tp_plan subsection in `training.rst`. - Blog: Updated ongoing work in `blogs/huggingface-tp/README.md`. ## Limitations - Only `colwise` and `rowwise` partition types are supported. Extended types (`colwise_rep`, `local_colwise`, `local_rowwise`, `local_packed_rowwise`, `gather`, `sequence_parallel`) are deferred. --------- Signed-off-by: Guokai Ma <guokai.ma@intel.com> Signed-off-by: Ma, Guokai <guokai.ma@gmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Author
Parents
Loading