transformers
FSDP + TP & native save/load distributed
#45028
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
133
Changes
View On
GitHub
FSDP + TP & native save/load distributed
#45028
3outeille
merged 133 commits into
main
from
refactor-tp-dtensor
init
7c843391
Merge branch 'main' into distributed_api
69bc48e1
3outeille
force pushed
from
ccd06bfe
to
fcea5cec
75 days ago
3outeille
force pushed
from
fcea5cec
to
f98e208f
75 days ago
3outeille
commented on 2026-04-08
Merge branch 'main' into distributed_api
b7ec958e
Merge remote-tracking branch 'origin/main' into distributed_api
45a01a54
FSDP2 (fully_shard) integration
a5c25548
DistributedConfig + shard-on-read loading
739332cd
3outeille
force-pushed the
fsdp-core-model-loading
branch
from
607cc114
to
739332cd
66 days ago
TPStyle API + dense model tensor parallelism
11b55a20
3outeille
force pushed
from
1aa7f5f1
to
11b55a20
66 days ago
Merge branch 'main' into distributed_api
eeefc9e8
Merge branch 'distributed_api' into fsdp-vs-ddp
90384750
revert some files
abfd57ee
Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
23a2c059
Add distributed training scripts
c33873e7
Merge branch 'distributed_api' of https://github.com/huggingface/tran…
e7832312
Remove train_fsdp_tp_torchtitan_style.py
34db8405
unify the utils for fsdp
6f9e2b67
Merge branch 'distributed_api' into fsdp-vs-ddp
5e017cf5
3outeille
force-pushed the
fsdp-core-model-loading
branch
from
dbc96197
to
c5672400
65 days ago
3outeille
force pushed
from
34a50850
to
eb428cc4
65 days ago
Fix CI: re-export moved FSDP utils + remove stale type: ignore
37dcc14d
Merge branch 'fsdp-vs-ddp' into fsdp-core-model-loading
c1dab9eb
3outeille
force-pushed the
fsdp-core-model-loading
branch
from
c5672400
to
c1dab9eb
65 days ago
Merge branch 'fsdp-core-model-loading' into refactor-tp-dtensor
e0c4e06d
3outeille
force pushed
from
eb428cc4
to
e0c4e06d
65 days ago
Fix ruff formatting in core_model_loading.py
21f05610
Fix ruff linting and formatting
cd45107f
Merge branch 'fsdp-core-model-loading' into refactor-tp-dtensor
52c390f2
Backport new TP/FSDP API from orchestration-save-load branch
ba3990fb
Fix DTensor imports in Copied-from model files
92a34916
MoE expert parallelism + sequence parallelism (#45408)
7ca7911b
do monkey patching for rotary
d4400d5b
3outeille
commented on 2026-04-09
Revert modeling file diffs to match fsdp-core-model-loading base
6793503b
Migrate all model TP plans from strings to TPStyle
b9435123
3outeille
commented on 2026-04-14
3outeille
commented on 2026-04-14
3outeille
commented on 2026-04-14
Restore mxfp4.py to match base branch
5ce6faa3
Drop mla_kv_a_proj and moe_identity_expert from TP plans
b694f364
ArthurZucker
commented on 2026-04-15
3outeille
commented on 2026-04-15
more comments
1b82460a
fix tp for most models. PyTorch doesn't implement all placement conv…
48f8d6f9
fix tp through _replicate_dtensor
91b48242
revert small change
44706eb1
push temporary fix for TP and strided shard for backward
aa45f5ba
refactor a bit
0a566c52
patches for rotary
11a55d46
refactor MoEExpertsParallel
53490d98
fix tp for last models
0c099155
refactor moe expert parallels
ebd03ecd
linting
c08c0714
add sp plan for models
4804d0d2
add deepseek v2 sp plan
1a51928a
undo sp plan for some tricky models
fd3a7221
3outeille
changed the base branch from
fsdp-core-model-loading
to
fsdp-vs-ddp
59 days ago
remove lm_head from config
253b89ee
first pass of refactoring dtensor shard operator
3ff1fee0
better refacto
4d96b2dc
batter explanation of DtensorShardOperation
04521bfe
refactor dtensor test to reflect real world scenario
f710f0d3
more comments
a35993c1
fix tp olmo hybrid and exaone
8529d7c2
Enhance tensor parallel weight tying logic to prevent clobbering of l…
43b792b9
ArthurZucker
commented on 2026-04-23
fix fsdp mixin test due to missing args
0dbef901
fix test non model
da83f324
skip sp plan for exaone and olmo hybrid
3903757c
linting
e51f6633
fix import for ci
96f3f296
test distributed config
dfb448e7
attempt to fix guarding import ci
0a74b7d7
fix ci check repro
c50e49cc
add ALL_PARALLEL_STYLES registry alongside TPStyle
f9daf7b1
route apply_tensor_parallel through ALL_PARALLEL_STYLES
8a1a9e53
migrate modular files to string-based TP plans
7819783f
migrate standalone configs and modelings to string-based TP plans
e70ac378
delete TPStyle dataclass
061d4e69
fix use_local_output defaults for SequenceParallel and PrepareModuleI…
8e0f60c1
use parallel style from torch
5b336bd9
3outeille
marked this pull request as ready for review
48 days ago
revert changes in weight converter
465d0295
remove dead code in set_param_for_module
bc6d6f9f
remove dead code
f305f924
cleaning again
39db8c17
cleaning
951d4ae3
revert change
1b040ef7
linting
85ef27c6
refactor dtensor shard ops
1fd7b1d0
revert some stuff in core model loading
4547eb33
core model loading clean
43086d33
ArthurZucker
commented on 2026-05-07
ArthurZucker
commented on 2026-05-07
guarding import
1b7ebe1b
better separation tensor parall and generic utils
6d867469
isolate DtensorShardOperation into a separate file
ff493468
no need to patch rotary
a806b3db
better seperation
98d2dc56
simplify gather_full_state_dict
14e02aac
simplify _replicate_dtensor
9acf944e
fix and clean _replicate_dtensor
20cf4e8d
better doc for DtensorShardOperation
ca6d06b1
fix saving optimizer with DCP for fused weights
7e2115f4
save_pretrained(distributed_checkpoint=true)
1c6f8484
linting
41bc6eb0
refactor into a single function _dtensor_from_local_like
27fc8a90
zeros_like instead of empty_like
1e25f1f0
move tp and fsdp under distributed
7405892b
distribute_model
ed45c917
fix deadlock when saving
f97c3a44
clip grad norm function
b59c4bf7
maybe_disable_foreach_and_fused_for_mixed_dtensor_groups
242e814f
3outeille
commented on 2026-05-13
3outeille
commented on 2026-05-13
3outeille
commented on 2026-05-13
3outeille
commented on 2026-05-13
3outeille
commented on 2026-05-13
better TP api for ease of understanding
8fe831fc
remove shard_param to make it easier
bf30f0a1
fix import in test
261c59b4
_swap_dtensor_params_for_local
f0f5f674
fix qwen3 nanochat dots1
c135d0e1
Merge branch 'main' into fsdp-vs-ddp
10c6563e
Merge branch 'fsdp-vs-ddp' into refactor-tp-dtensor
5db32b87
add tpu
920ade5c
move TP refactor experimentation scripts to backup branch
13646c82
linting
c49e9abe
3outeille
changed the title
TP refactor for FSDP + TP integration
2D // + native save/load distributed
35 days ago
3outeille
changed the title
2D // + native save/load distributed
FSDP + TP & native save/load distributed
35 days ago
register distributed sharding_utils and utils in __init__
a4c6ba8e
rename TP plan styles to match new ALL_PARALLEL_STYLES registry
65b0311a
enable EP
dbf0c609
Add enable_expert_parallel configuration option in test_distributed_c…
51068ca9
3outeille
requested a review
from
ArthurZucker
35 days ago
no more auto mode
bf0696f5
edit fsdp plan to every other models
d13ab2e8
update fsdp mixin tests
41652094
linting
6f5dbfb3
fix test fsdp
d075668d
fsdp linting
9472436d
revert gitignore
9a835eba
ArthurZucker
approved these changes on 2026-05-19
_apply within for loop
86dc7b4a
rename
9378ccb2
doc sp plan
e8785942
fix
8caa870c
unified settattr + torch no grad + _local_tensor
e0e787bc
revert
9158d99f
linting
1c8203a0
3outeille
changed the base branch from
fsdp-vs-ddp
to
main
30 days ago
Merge branch 'main' into refactor-tp-dtensor
7fb37af0
Merge branch 'main' into refactor-tp-dtensor
0fdc61f0
fix ruff
7e2f6868
make check-repository-consistency
e6484d37
3outeille
enabled auto-merge
30 days ago
disabled auto-merge
30 days ago
Manually disabled by user
trigger fsdp mixin test in CI
4b649ed0
Merge branch 'main' into refactor-tp-dtensor
5d673d31
fix fsdp ci
6b835cb5
Merge branch 'refactor-tp-dtensor' of https://github.com/huggingface/…
a38bc328
3outeille
enabled auto-merge
30 days ago
disabled auto-merge
30 days ago
Manually disabled by user
Reset tests/test_modeling_common.py to main
d563a9c0
3outeille
merged
9ba8e858
into main
30 days ago
3outeille
deleted the refactor-tp-dtensor branch
30 days ago
ArthurZucker
commented on 2026-05-20
Login to write a write a comment.
Login via GitHub
Reviewers
ArthurZucker
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub