accelerate
Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh)
#3682
Merged

Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) #3682

salmanmohammadi
S1ro1 Feat: init
2f471e3a
S1ro1 Feat: add validation + init from kwargs
43b1ca7a
S1ro1 Fix: minor fixes
79faa138
S1ro1 Feat: more cleanup
16f348bb
S1ro1 Minor refactor
53ef5247
S1ro1 remove import
cd31b02b
salmanmohammadi adding support for pre-configured device mesh
2d892105
salmanmohammadi
salmanmohammadi commented on 2025-07-16
salmanmohammadi
salmanmohammadi commented on 2025-07-16
salmanmohammadi adding device mesh to fsdp2
afaafef6
salmanmohammadi
salmanmohammadi commented on 2025-07-16
salmanmohammadi moving mesh dim defn to parralismconfig
2d952cba
salmanmohammadi tests
91ca626f
salmanmohammadi WIP device mesh/accelerator validation
910368b8
salmanmohammadi WIP more tests
b7d154ec
salmanmohammadi Test Driven Development (TDD)
8a0de72b
salmanmohammadi fixing build_device_mesh
1c68efb4
salmanmohammadi FSDP dim names
e01abf13
adding example
69b523c6
WIP
c765a447
salmanmohammadi salmanmohammadi changed the title [WIP] Parallelism config + BYODM (Bring Your Own Device Mesh) [WIP] Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) 244 days ago
fixing HSDP
8d97930c
S1ro1 Feat: add back old options
57c0d9e0
working example
c93285a4
winglian
winglian commented on 2025-07-21
winglian
winglian commented on 2025-07-21
debugging
cb40d36d
adding parallelism config to partialstate
b76ee67a
S1ro1 Feat: revert ddp changes
9aa26123
S1ro1 Revert DDP
de96e74d
S1ro1 S1ro1 force pushed from 1e4a6215 to 9de55fcb 243 days ago
S1ro1 Feat: (untested) update mesh dims and some minor tweaks
fd05e3bc
S1ro1 S1ro1 force pushed from 9de55fcb to fd05e3bc 243 days ago
adding dp_cp dims
efc903e0
updating comments
7c3d0e3c
winglian
winglian commented on 2025-07-22
S1ro1
S1ro1 commented on 2025-07-22
SunMarc
SunMarc commented on 2025-07-22
WIP
3cfce252
wip 2
1bbdb751
reverting
aa749ad3
storing state in accelerator rather than acceleratorstate
aa745766
S1ro1 Fix: minor tweaks
4e99b9cd
wip example update
3d235cb4
merging
61868c29
S1ro1 Fixes for non-fsdp2 case
f96fea3c
S1ro1 Feat: ensure ddp/tp only works
dd894525
winglian
winglian commented on 2025-07-23
winglian
winglian commented on 2025-07-23
salmanmohammadi
salmanmohammadi commented on 2025-07-23
updating example
7f243e09
updating example
4a2dd58f
updating examples, fixing state
dc145c2b
salmanmohammadi
salmanmohammadi commented on 2025-07-23
fixed state
f21547f3
comments
1a49c164
fixing partial state check
07bf2b3b
SunMarc
SunMarc commented on 2025-07-23
linting
f274b354
salmanmohammadi salmanmohammadi requested a review from S1ro1 S1ro1 242 days ago
salmanmohammadi salmanmohammadi requested a review from SunMarc SunMarc 242 days ago
salmanmohammadi salmanmohammadi changed the title [WIP] Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) 242 days ago
comments
a6feca96
removing fn
80deb7ee
merging
52c178fe
winglian
winglian commented on 2025-07-23
winglian
winglian commented on 2025-07-23
S1ro1 WIP: fix tp
133ef5f7
S1ro1
S1ro1 commented on 2025-07-23
winglian
winglian commented on 2025-07-24
SunMarc
SunMarc commented on 2025-07-24
HuggingFaceDocBuilderDev
comments
74009ea7
salmanmohammadi salmanmohammadi requested a review from SunMarc SunMarc 240 days ago
removing return
379daa0b
reverting upcast
168b5202
winglian
winglian commented on 2025-07-24
winglian
winglian add guards
76a546fd
winglian guards for empty self.parallelism_config
e8963dc1
salmanmohammadi
salmanmohammadi commented on 2025-07-25
salmanmohammadi
salmanmohammadi commented on 2025-07-25
salmanmohammadi
salmanmohammadi commented on 2025-07-25
salmanmohammadi
salmanmohammadi commented on 2025-07-25
salmanmohammadi
salmanmohammadi commented on 2025-07-25
salmanmohammadi
salmanmohammadi commented on 2025-07-25
winglian use len on tuple to check if empty
a402faff
winglian winglian force pushed from d9aec5cf to a402faff 239 days ago
S1ro1 Feat: cleanup example
235d29ff
S1ro1 Feat: some cleanup of example
1017752a
S1ro1 Merge branch 'main' into device_mesh_parallelism_config
36a1234c
S1ro1 Feat: add trackio
7ddb3abb
S1ro1 Fix: improve trackio
9fdc320d
S1ro1 Feat: TP works
00dd4af6
S1ro1 Feat: some fsdp2 improv
d21ff9f2
S1ro1 Feat: working examples
d2608422
S1ro1
S1ro1 commented on 2025-07-28
winglian handle clipping for tensor parallel
8b89d278
S1ro1 Implicit replicate
4709fc88
SunMarc
SunMarc commented on 2025-07-29
SunMarc
SunMarc commented on 2025-07-29
S1ro1 Refactor: move to separate file + cleanup + basic comments
353b5593
S1ro1 Fix: add unadded files, fix circular import
7364440b
S1ro1 Feat: better readme
e90f832c
S1ro1 Feat: add blog + ultrascale links
044c7130
S1ro1 Tmp: should_save_model now returns only true
464a642d
SunMarc
SunMarc approved these changes on 2025-07-30
S1ro1 Fix: remove implicit_replication and style
f85eadf1
S1ro1 Fix: remove optional
86771e25
winglian add guard on parallelism_config.tp_enabled
c80aae08
winglian fix import
c8a2ae56
fixing empty parallelism_config
ec59f84d
winglian fix import path for test patch
0afb69f1
fixing patch
89aad7a3
merging
c570f7c3
S1ro1 S1ro1 merged 9359a019 into main 234 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone