transformers
FSDP + TP & native save/load distributed
#45028
Merged

FSDP + TP & native save/load distributed #45028

3outeille merged 133 commits into main from refactor-tp-dtensor
3outeille
3outeille init
7c843391
3outeille Merge branch 'main' into distributed_api
69bc48e1
HuggingFaceDocBuilderDev
3outeille 3outeille force pushed from ccd06bfe to fcea5cec 75 days ago
3outeille 3outeille force pushed from fcea5cec to f98e208f 75 days ago
3outeille
3outeille commented on 2026-04-08
3outeille Merge branch 'main' into distributed_api
b7ec958e
3outeille Merge remote-tracking branch 'origin/main' into distributed_api
45a01a54
3outeille FSDP2 (fully_shard) integration
a5c25548
3outeille DistributedConfig + shard-on-read loading
739332cd
3outeille 3outeille force-pushed the fsdp-core-model-loading branch from 607cc114 to 739332cd 66 days ago
3outeille TPStyle API + dense model tensor parallelism
11b55a20
3outeille 3outeille force pushed from 1aa7f5f1 to 11b55a20 66 days ago
3outeille Merge branch 'main' into distributed_api
eeefc9e8
3outeille Merge branch 'distributed_api' into fsdp-vs-ddp
90384750
3outeille revert some files
abfd57ee
3outeille Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
23a2c059
3outeille Add distributed training scripts
c33873e7
3outeille Merge branch 'distributed_api' of https://github.com/huggingface/tran…
e7832312
3outeille Remove train_fsdp_tp_torchtitan_style.py
34db8405
3outeille unify the utils for fsdp
6f9e2b67
3outeille Merge branch 'distributed_api' into fsdp-vs-ddp
5e017cf5
3outeille 3outeille force-pushed the fsdp-core-model-loading branch from dbc96197 to c5672400 65 days ago
3outeille 3outeille force pushed from 34a50850 to eb428cc4 65 days ago
3outeille Fix CI: re-export moved FSDP utils + remove stale type: ignore
37dcc14d
3outeille Merge branch 'fsdp-vs-ddp' into fsdp-core-model-loading
c1dab9eb
3outeille 3outeille force-pushed the fsdp-core-model-loading branch from c5672400 to c1dab9eb 65 days ago
3outeille Merge branch 'fsdp-core-model-loading' into refactor-tp-dtensor
e0c4e06d
3outeille 3outeille force pushed from eb428cc4 to e0c4e06d 65 days ago
3outeille Fix ruff formatting in core_model_loading.py
21f05610
3outeille Fix ruff linting and formatting
cd45107f
3outeille Merge branch 'fsdp-core-model-loading' into refactor-tp-dtensor
52c390f2
3outeille Backport new TP/FSDP API from orchestration-save-load branch
ba3990fb
3outeille Fix DTensor imports in Copied-from model files
92a34916
3outeille MoE expert parallelism + sequence parallelism (#45408)
7ca7911b
3outeille do monkey patching for rotary
d4400d5b
3outeille
3outeille commented on 2026-04-09
3outeille Revert modeling file diffs to match fsdp-core-model-loading base
6793503b
3outeille Migrate all model TP plans from strings to TPStyle
b9435123
3outeille
3outeille commented on 2026-04-14
3outeille
3outeille commented on 2026-04-14
3outeille
3outeille commented on 2026-04-14
3outeille Restore mxfp4.py to match base branch
5ce6faa3
3outeille Drop mla_kv_a_proj and moe_identity_expert from TP plans
b694f364
ArthurZucker
ArthurZucker commented on 2026-04-15
3outeille
3outeille commented on 2026-04-15
3outeille more comments
1b82460a
3outeille fix tp for most models. PyTorch doesn't implement all placement conv…
48f8d6f9
3outeille fix tp through _replicate_dtensor
91b48242
3outeille revert small change
44706eb1
3outeille push temporary fix for TP and strided shard for backward
aa45f5ba
3outeille refactor a bit
0a566c52
3outeille patches for rotary
11a55d46
3outeille refactor MoEExpertsParallel
53490d98
3outeille fix tp for last models
0c099155
3outeille refactor moe expert parallels
ebd03ecd
3outeille linting
c08c0714
3outeille add sp plan for models
4804d0d2
3outeille add deepseek v2 sp plan
1a51928a
3outeille undo sp plan for some tricky models
fd3a7221
3outeille 3outeille changed the base branch from fsdp-core-model-loading to fsdp-vs-ddp 59 days ago
3outeille remove lm_head from config
253b89ee
3outeille first pass of refactoring dtensor shard operator
3ff1fee0
3outeille better refacto
4d96b2dc
3outeille batter explanation of DtensorShardOperation
04521bfe
3outeille refactor dtensor test to reflect real world scenario
f710f0d3
3outeille more comments
a35993c1
3outeille fix tp olmo hybrid and exaone
8529d7c2
3outeille Enhance tensor parallel weight tying logic to prevent clobbering of l…
43b792b9
ArthurZucker
ArthurZucker commented on 2026-04-23
3outeille fix fsdp mixin test due to missing args
0dbef901
3outeille fix test non model
da83f324
3outeille skip sp plan for exaone and olmo hybrid
3903757c
3outeille linting
e51f6633
3outeille fix import for ci
96f3f296
3outeille test distributed config
dfb448e7
3outeille attempt to fix guarding import ci
0a74b7d7
3outeille fix ci check repro
c50e49cc
3outeille add ALL_PARALLEL_STYLES registry alongside TPStyle
f9daf7b1
3outeille route apply_tensor_parallel through ALL_PARALLEL_STYLES
8a1a9e53
3outeille migrate modular files to string-based TP plans
7819783f
3outeille migrate standalone configs and modelings to string-based TP plans
e70ac378
3outeille delete TPStyle dataclass
061d4e69
3outeille fix use_local_output defaults for SequenceParallel and PrepareModuleI…
8e0f60c1
3outeille use parallel style from torch
5b336bd9
3outeille 3outeille marked this pull request as ready for review 48 days ago
3outeille revert changes in weight converter
465d0295
3outeille remove dead code in set_param_for_module
bc6d6f9f
3outeille remove dead code
f305f924
3outeille cleaning again
39db8c17
3outeille cleaning
951d4ae3
3outeille revert change
1b040ef7
3outeille linting
85ef27c6
3outeille refactor dtensor shard ops
1fd7b1d0
3outeille revert some stuff in core model loading
4547eb33
3outeille core model loading clean
43086d33
ArthurZucker
ArthurZucker commented on 2026-05-07
ArthurZucker
ArthurZucker commented on 2026-05-07
3outeille guarding import
1b7ebe1b
3outeille better separation tensor parall and generic utils
6d867469
3outeille isolate DtensorShardOperation into a separate file
ff493468
3outeille no need to patch rotary
a806b3db
3outeille better seperation
98d2dc56
3outeille simplify gather_full_state_dict
14e02aac
3outeille simplify _replicate_dtensor
9acf944e
3outeille fix and clean _replicate_dtensor
20cf4e8d
3outeille better doc for DtensorShardOperation
ca6d06b1
3outeille fix saving optimizer with DCP for fused weights
7e2115f4
3outeille save_pretrained(distributed_checkpoint=true)
1c6f8484
3outeille linting
41bc6eb0
3outeille refactor into a single function _dtensor_from_local_like
27fc8a90
3outeille zeros_like instead of empty_like
1e25f1f0
3outeille move tp and fsdp under distributed
7405892b
3outeille distribute_model
ed45c917
3outeille fix deadlock when saving
f97c3a44
3outeille clip grad norm function
b59c4bf7
3outeille maybe_disable_foreach_and_fused_for_mixed_dtensor_groups
242e814f
3outeille
3outeille commented on 2026-05-13
3outeille
3outeille commented on 2026-05-13
3outeille
3outeille commented on 2026-05-13
3outeille
3outeille commented on 2026-05-13
3outeille
3outeille commented on 2026-05-13
3outeille better TP api for ease of understanding
8fe831fc
3outeille remove shard_param to make it easier
bf30f0a1
3outeille fix import in test
261c59b4
3outeille _swap_dtensor_params_for_local
f0f5f674
3outeille fix qwen3 nanochat dots1
c135d0e1
3outeille Merge branch 'main' into fsdp-vs-ddp
10c6563e
3outeille Merge branch 'fsdp-vs-ddp' into refactor-tp-dtensor
5db32b87
3outeille add tpu
920ade5c
3outeille move TP refactor experimentation scripts to backup branch
13646c82
3outeille linting
c49e9abe
3outeille 3outeille changed the title TP refactor for FSDP + TP integration 2D // + native save/load distributed 35 days ago
3outeille 3outeille changed the title 2D // + native save/load distributed FSDP + TP & native save/load distributed 35 days ago
3outeille register distributed sharding_utils and utils in __init__
a4c6ba8e
3outeille rename TP plan styles to match new ALL_PARALLEL_STYLES registry
65b0311a
3outeille enable EP
dbf0c609
3outeille Add enable_expert_parallel configuration option in test_distributed_c…
51068ca9
3outeille 3outeille requested a review from ArthurZucker ArthurZucker 35 days ago
AmineDiro
3outeille no more auto mode
bf0696f5
3outeille edit fsdp plan to every other models
d13ab2e8
3outeille update fsdp mixin tests
41652094
3outeille
3outeille linting
6f5dbfb3
3outeille fix test fsdp
d075668d
3outeille fsdp linting
9472436d
3outeille revert gitignore
9a835eba
ArthurZucker
ArthurZucker approved these changes on 2026-05-19
3outeille _apply within for loop
86dc7b4a
3outeille rename
9378ccb2
3outeille doc sp plan
e8785942
3outeille fix
8caa870c
3outeille unified settattr + torch no grad + _local_tensor
e0e787bc
3outeille revert
9158d99f
3outeille linting
1c8203a0
3outeille 3outeille changed the base branch from fsdp-vs-ddp to main 30 days ago
3outeille Merge branch 'main' into refactor-tp-dtensor
7fb37af0
3outeille
github-actions
3outeille Merge branch 'main' into refactor-tp-dtensor
0fdc61f0
3outeille fix ruff
7e2f6868
3outeille make check-repository-consistency
e6484d37
github-actions
3outeille 3outeille enabled auto-merge 30 days ago
disabled auto-merge 30 days ago
Manually disabled by user
3outeille trigger fsdp mixin test in CI
4b649ed0
3outeille Merge branch 'main' into refactor-tp-dtensor
5d673d31
3outeille fix fsdp ci
6b835cb5
3outeille Merge branch 'refactor-tp-dtensor' of https://github.com/huggingface/…
a38bc328
3outeille 3outeille enabled auto-merge 30 days ago
disabled auto-merge 30 days ago
Manually disabled by user
3outeille Reset tests/test_modeling_common.py to main
d563a9c0
github-actions
3outeille 3outeille merged 9ba8e858 into main 30 days ago
3outeille 3outeille deleted the refactor-tp-dtensor branch 30 days ago
ArthurZucker
ArthurZucker commented on 2026-05-20

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone