Implement Fully Sharded Data Parallel (FSDP) in PyTorch XLA #3431
ronghanghu
force pushed
from
dca02d2c
to
1c8193c5
3 years ago
ronghanghu
force pushed
from
1c8193c5
to
2fbc2556
3 years ago
ronghanghu
force pushed
from
2fbc2556
to
ea8a07a9
3 years ago
ronghanghu
force pushed
from
ea8a07a9
to
614f876e
3 years ago
ronghanghu
force pushed
from
614f876e
to
bc0ccb3c
3 years ago
ronghanghu
force pushed
from
bc0ccb3c
to
59b62e1c
3 years ago
ronghanghu
force pushed
from
59b62e1c
to
935c60a8
3 years ago
ronghanghu
force pushed
from
935c60a8
to
2f3a85ad
3 years ago
ronghanghu
force pushed
from
96ae7fe6
to
accf50db
3 years ago
ronghanghu
force pushed
from
accf50db
to
288c882e
3 years ago
ronghanghu
force pushed
from
288c882e
to
74692e8e
3 years ago
ronghanghu
force pushed
from
74692e8e
to
79227596
3 years ago
ronghanghu
force pushed
from
79227596
to
f15140b9
3 years ago
ronghanghu
marked this pull request as ready for review 3 years ago
hjm-aws
requested changes
on 2022-03-30
hjm-aws
requested changes
on 2022-03-31
ronghanghu
force pushed
from
65af8177
to
414da498
3 years ago
miladm
commented
on 2022-04-14
miladm
commented
on 2022-04-28
ronghanghu
force pushed
from
776816fb
to
32e9032d
3 years ago
ronghanghu
force pushed
from
84f3e1b9
to
aad4002a
3 years ago
ronghanghu
force pushed
from
aad4002a
to
f3850055
3 years ago
ronghanghu
force pushed
from
f3850055
to
1b97474c
3 years ago
ronghanghu
force pushed
from
2a924b69
to
f47733d6
3 years ago
ronghanghu
force pushed
from
e4d1240b
to
bd7cd0f4
3 years ago
Implement Fully Sharded Data Parallel (FSDP) in PyTorch XLA
9a424948
move the FSDP module to `torch_xla.distributed`
a782a618
adding `mark_step_on_freeing` as a temp workaround to #3455
e202d52c
check in __init__ whether the module is already FSDP; fix exception t…
feac851e
add `optimization_barrier_` (https://github.com/pytorch/xla/pull/3493…
6cc577fd
also apply `xm.optimization_barrier_` to FSDP output's gradients
86b85238
deprecate `mark_step_on_freeing` (since we have optimization barrier …
406de3f8
add option to run a dummy forward pass in FSDP
abe7b568
add `_shard_size_multiple` to make sharded parameters a multiple of 1…
efd98ac6
refactor optimization_barrier_ to separately apply to forward and bac…
f6119ca2
seal off more relevant ops w/ optimization_barrier_ to avoid undesire…
d1d1483f
remove obsolete `mark_step_on_freeing` and `use_all_gather_via_all_re…
267f4f66
handle keyword arguments in `checkpoint_module`
09806f1f
add gradient checkpointing option to MNIST and ImageNet FSDP examples
ec9ee681
refactor `optimization_barrier` and only apply it in forward or backw…
d2b95978
refactor command line tool to consolidate sharded checkpoints
eeca5bb0
address reviewers' comments from GitHub
327c78ea
add more user instructions for checkpoint consolidation
b12ea4d4
change `flatten_parameters` default to False since it didn't bring an…
4e51a277
documentation refinement
191ac9b4
ronghanghu
force pushed
from
a5855a1c
to
191ac9b4
3 years ago
miladm
approved these changes
on 2022-05-09
miladm
merged
3c83269f
into master 3 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub