PATCH: add back n-dim device-mesh + fix tp trainer saving #39693
Feat: something
4dd497fc
Feat: initial changes
08f54bbe
tmp changes to unblock
f84ecc45
Refactor
17d2d695
remove todo
56d2c9e2
Merge branch 'main' into fsdp2-tp
622e9b97
Feat: docstring
b35ac20a
Merge branch 'main' into fsdp2-tp
33e28196
Merge branch 'main' into fsdp2-tp
83dedd8a
S1ro1
changed the title PATCH: add back n-dim device-mesh PATCH: add back n-dim device-mesh + fix tp hook registration 334 days ago
S1ro1
force pushed
from
bf21f0a3
to
40fabad8
334 days ago
S1ro1
force pushed
from
40fabad8
to
83dedd8a
333 days ago
Fix: saving of distributed model in trainer
2423039f
Fix: distributed saving with trainer
4ed16393
Feat: add pure tp saving
b5708c8e
S1ro1
changed the title PATCH: add back n-dim device-mesh + fix tp hook registration PATCH: add back n-dim device-mesh + fix tp trainer saving 333 days ago
Only require tp dim if ndim > 1
edd76843
Fix: default to None
d6581d83
Fix: better comments/errors
bba981c3
Fix: properly check tp_size attribute
60a96877
Fix: properly check for None in tp_size
354e68f6
Merge branch 'main' into fsdp2-tp
cacd06b3
S1ro1
enabled auto-merge (squash) 333 days ago
S1ro1
merged
4c7da9fe
into main 333 days ago
S1ro1
deleted the fsdp2-tp branch 333 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub