transformers
FSDP2 native support in transformers
#44083
Open

FSDP2 native support in transformers #44083

3outeille wants to merge 98 commits into main from fsdp-vs-ddp
3outeille
3outeille Add distributed training CI job to CircleCI configuration
216b909b
3outeille update naming
4c062ae3
3outeille Add TrainingDistributedTesterMixin for distributed training tests (ju…
46ffe625
3outeille Add markers for distributed training tests in conftest.py and pyproje…
5bfe881e
3outeille Add is_training_distributed_test decorator allowing conditional skip…
de58c0eb
3outeille can now run a simple hello world in distributed setting on cpu
084c3966
3outeille easier way to gridseach different FSDP x TP configuration of distribu…
868c38d8
3outeille add 2D device mesh
61d3ee76
3outeille Refactor global_wrapper to use device mesh for distributed training
e14a25cc
3outeille instantiate model and begin fsdp
e1a415ee
3outeille Improve logging to include rank when distributed training is initialized
e2221a5b
3outeille undo fsdp as it is not prio right now (it requires uniformization of …
7b744c32
3outeille Merge branch 'main' into v5-distributed-training-ci
7106d1c7
3outeille add tp=2 test training
acf75f0a
3outeille Refactor training mixins for distributed testing
06eca8a6
3outeille Add FSDP2 integration functions
38455328
3outeille 3outeille changed the title Add distributed training CI job to CircleCI configuration FSDP native support in transformers 35 days ago
3outeille Merge branch 'main' into v5-distributed-training-ci
6fb7c3da
3outeille Merge branch 'main' into fsdp-vs-ddp
b5ba82b4
3outeille fsdp plan in pretrained_model
5c103fc4
HuggingFaceDocBuilderDev
3outeille Merge branch 'v5-distributed-training-ci' into fsdp-vs-ddp
d4beae16
3outeille Merge branch 'main' into v5-distributed-training-ci
9cbdff45
3outeille Merge branch 'main' into fsdp-vs-ddp
a4f11b6e
3outeille Merge branch 'v5-distributed-training-ci' into fsdp-vs-ddp
53399cf4
3outeille add test fsdp vs ddp for dense model
51af08ea
3outeille add dtype
2b607d72
3outeille add save/load tests for FSDP2
e0e7aaff
3outeille add more test (manual plan, better resharding wrap)
0d04cf3d
3outeille add tied test and it fails + free port
70194667
3outeille add sh tests for dev
d0b5ff2d
3outeille fix tied fsdp vs ddp auto loss and grad norm test
b7b6cb35
3outeille manual plan in tied works finally
d3b9cc45
3outeille trigger tests for all text models (dense and moe)
f9ca3170
3outeille dispatch gpu tests for dev purpose
4c2f3d2f
3outeille slightly bigger model for tests
df95351d
3outeille make tests deterministic for dense. Now move on to MoE
62be4a48
3outeille add fsdp support for moe
dccef4d1
3outeille remove uselss files
cfa82fee
3outeille breaking: cpu offload and mixed precision almost fixed
e989b926
3outeille cleaner test
cb424ee7
3outeille model sorted by usage
698401a6
3outeille refactor fsdp + tests fsdp mixin
affbb99d
3outeille 3outeille changed the base branch from v5-distributed-training-ci to main 13 days ago
3outeille Merge branch 'main' into fsdp-vs-ddp
80cedf79
3outeille add fsdp tests in ci for every models
9e6505e0
3outeille Merge branch 'main' into fsdp-vs-ddp
d5aa3174
3outeille 3outeille marked this pull request as ready for review 13 days ago
3outeille Merge branch 'main' into fsdp-vs-ddp
d6dc068d
github-actions github-actions requested a review from SunMarc SunMarc 13 days ago
github-actions github-actions requested a review from ydshieh ydshieh 13 days ago
3outeille Merge branch 'main' into fsdp-vs-ddp
4f456a6b
SunMarc
SunMarc commented on 2026-03-11
3outeille Update mixed precision policy in FSDP integration to set output_dtype…
ccf29b5e
3outeille Merge branch 'fsdp-vs-ddp' of github.com:huggingface/transformers int…
c209232a
cursor
cursor commented on 2026-03-12
3outeille save / load with dcp + saftensors
3f1dcdd2
cursor
cursor commented on 2026-03-12
3outeille Merge branch 'main' into fsdp-vs-ddp
f1d79c68
3outeille linting
29873f06
3outeille Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
a4fd9372
3outeille Merge branch 'main' into fsdp-vs-ddp
2c285248
3outeille Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
30db8386
3outeille Merge branch 'main' into fsdp-vs-ddp
c7b71101
3outeille fix RuntimeError: expected data_ptr to be aligned to 16 bytes
eb2a1c03
3outeille Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
6be343f6
3outeille make tests run on CPU only
a1acc285
3outeille dont test mixed precision as it is too flaky, End to end results are …
e2ff1b1e
3outeille Merge branch 'main' into fsdp-vs-ddp
37c4db7a
3outeille undo grouped test
0e706081
3outeille unskip FSDP test for BLT
fd568474
3outeille Merge branch 'main' into fsdp-vs-ddp
498846b8
3outeille Merge branch 'main' into fsdp-vs-ddp
e0a09ef9
3outeille Revert "undo grouped test"
13d8c0ba
3outeille Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
ec9991c0
3outeille trigger fsdp mixin only to the 10 most download models in dense and m…
236b10e7
3outeille cleaning
222e9ac1
3outeille Merge branch 'main' into fsdp-vs-ddp
443ebcd6
3outeille restoring test traning mixin
0046f00a
3outeille Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
b052b572
3outeille add logging to profile how long test takes
bd8209dc
3outeille Merge branch 'main' into fsdp-vs-ddp
814ddd83
3outeille undo fsdp2 test all
421861db
3outeille Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
dd86a46d
3outeille Remove skipped test for FSDP all-in-one due to recurrent-specific sha…
d6fcf621
3outeille Merge branch 'main' into fsdp-vs-ddp
12b4c759
3outeille linting
3d8d54fb
3outeille for save/load test, just test it on 3 steps only
28dedc1a
3outeille fucking dist.barrier()
2db5f951
3outeille Merge branch 'main' into fsdp-vs-ddp
b65acdd6
3outeille force eager attn
d2fd9d15
3outeille better way to pass eager by default
7315a488
3outeille linting
6ad2cb92
3outeille Merge branch 'main' into fsdp-vs-ddp
3bc60f2e
3outeille 3outeille changed the title FSDP native support in transformers FSDP2 native support in transformers 6 days ago
3outeille skip modernbert decoder test
e3eae48d
3outeille 3outeille requested a review from ArthurZucker ArthurZucker 6 days ago
3outeille 3outeille requested a review from SunMarc SunMarc 6 days ago
3outeille 3outeille removed review request from ydshieh ydshieh 5 days ago
ArthurZucker
ArthurZucker commented on 2026-03-19
kashif
3outeille better import
a6593ace
3outeille dont fallback to cpu for mps backend
c5415974
3outeille typo
08a3b440
3outeille rename function
f9890e39
3outeille remove distribute_fsdp_model
c994a899
3outeille resue device_mesh
900e227d
3outeille import guarding
0fb7ef76
3outeille Merge branch 'main' into fsdp-vs-ddp
56c94967
3outeille accept auto string for fsdp plan + pass is_fsdp_managed_module to utils
55faa289
3outeille Merge branch 'fsdp-vs-ddp' of https://github.com/huggingface/transfor…
4b4e796e
3outeille Merge branch 'main' into fsdp-vs-ddp
5a3c59c6
github-actions
3outeille import top level
99af9a41
github-actions

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone