FSDP2 native support in transformers #44083
Add distributed training CI job to CircleCI configuration
216b909b
update naming
4c062ae3
Add TrainingDistributedTesterMixin for distributed training tests (ju…
46ffe625
Add markers for distributed training tests in conftest.py and pyproje…
5bfe881e
Add is_training_distributed_test decorator allowing conditional skip…
de58c0eb
can now run a simple hello world in distributed setting on cpu
084c3966
easier way to gridseach different FSDP x TP configuration of distribu…
868c38d8
add 2D device mesh
61d3ee76
Refactor global_wrapper to use device mesh for distributed training
e14a25cc
instantiate model and begin fsdp
e1a415ee
Improve logging to include rank when distributed training is initialized
e2221a5b
undo fsdp as it is not prio right now (it requires uniformization of …
7b744c32
Merge branch 'main' into v5-distributed-training-ci
7106d1c7
add tp=2 test training
acf75f0a
Refactor training mixins for distributed testing
06eca8a6
Add FSDP2 integration functions
38455328
3outeille
changed the title Add distributed training CI job to CircleCI configuration FSDP native support in transformers 35 days ago
Merge branch 'main' into v5-distributed-training-ci
6fb7c3da
Merge branch 'main' into fsdp-vs-ddp
b5ba82b4
fsdp plan in pretrained_model
5c103fc4
Merge branch 'v5-distributed-training-ci' into fsdp-vs-ddp
d4beae16
Merge branch 'main' into v5-distributed-training-ci
9cbdff45
Merge branch 'main' into fsdp-vs-ddp
a4f11b6e
Merge branch 'v5-distributed-training-ci' into fsdp-vs-ddp
53399cf4
add test fsdp vs ddp for dense model
51af08ea
add dtype
2b607d72
add save/load tests for FSDP2
e0e7aaff
add more test (manual plan, better resharding wrap)
0d04cf3d
add tied test and it fails + free port
70194667
add sh tests for dev
d0b5ff2d
fix tied fsdp vs ddp auto loss and grad norm test
b7b6cb35
manual plan in tied works finally
d3b9cc45
trigger tests for all text models (dense and moe)
f9ca3170
dispatch gpu tests for dev purpose
4c2f3d2f
slightly bigger model for tests
df95351d
make tests deterministic for dense. Now move on to MoE
62be4a48
add fsdp support for moe
dccef4d1
remove uselss files
cfa82fee
breaking: cpu offload and mixed precision almost fixed
e989b926
cleaner test
cb424ee7
model sorted by usage
698401a6
refactor fsdp + tests fsdp mixin
affbb99d
3outeille
changed the base branch from
v5-distributed-training-ci
to
main
13 days ago
Merge branch 'main' into fsdp-vs-ddp
80cedf79
add fsdp tests in ci for every models
9e6505e0
Merge branch 'main' into fsdp-vs-ddp
d5aa3174
3outeille
marked this pull request as ready for review 13 days ago
Merge branch 'main' into fsdp-vs-ddp
d6dc068d
Merge branch 'main' into fsdp-vs-ddp
4f456a6b
Update mixed precision policy in FSDP integration to set output_dtype…
ccf29b5e
Merge branch 'fsdp-vs-ddp' of github.com:huggingface/transformers int…
c209232a
cursor
commented
on 2026-03-12
save / load with dcp + saftensors
3f1dcdd2
cursor
commented
on 2026-03-12
Merge branch 'main' into fsdp-vs-ddp
f1d79c68
linting
29873f06
Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
a4fd9372
Merge branch 'main' into fsdp-vs-ddp
2c285248
Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
30db8386
Merge branch 'main' into fsdp-vs-ddp
c7b71101
fix RuntimeError: expected data_ptr to be aligned to 16 bytes
eb2a1c03
Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
6be343f6
make tests run on CPU only
a1acc285
dont test mixed precision as it is too flaky, End to end results are …
e2ff1b1e
Merge branch 'main' into fsdp-vs-ddp
37c4db7a
undo grouped test
0e706081
unskip FSDP test for BLT
fd568474
Merge branch 'main' into fsdp-vs-ddp
498846b8
Merge branch 'main' into fsdp-vs-ddp
e0a09ef9
Revert "undo grouped test"
13d8c0ba
Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
ec9991c0
trigger fsdp mixin only to the 10 most download models in dense and m…
236b10e7
cleaning
222e9ac1
Merge branch 'main' into fsdp-vs-ddp
443ebcd6
restoring test traning mixin
0046f00a
Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
b052b572
add logging to profile how long test takes
bd8209dc
Merge branch 'main' into fsdp-vs-ddp
814ddd83
undo fsdp2 test all
421861db
Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
dd86a46d
Remove skipped test for FSDP all-in-one due to recurrent-specific sha…
d6fcf621
Merge branch 'main' into fsdp-vs-ddp
12b4c759
linting
3d8d54fb
for save/load test, just test it on 3 steps only
28dedc1a
fucking dist.barrier()
2db5f951
Merge branch 'main' into fsdp-vs-ddp
b65acdd6
force eager attn
d2fd9d15
better way to pass eager by default
7315a488
linting
6ad2cb92
Merge branch 'main' into fsdp-vs-ddp
3bc60f2e
3outeille
changed the title FSDP native support in transformers FSDP2 native support in transformers 6 days ago
skip modernbert decoder test
e3eae48d
better import
a6593ace
dont fallback to cpu for mps backend
c5415974
typo
08a3b440
rename function
f9890e39
remove distribute_fsdp_model
c994a899
resue device_mesh
900e227d
import guarding
0fb7ef76
Merge branch 'main' into fsdp-vs-ddp
56c94967
accept auto string for fsdp plan + pass is_fsdp_managed_module to utils
55faa289
Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
4b4e796e
Merge branch 'main' into fsdp-vs-ddp
5a3c59c6
import top level
99af9a41
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub