Add Zenflow code for Stage 1 & 2 #7391
Antlera
force pushed
from
32a9ff90
to
53c564d0
91 days ago
Antlera
force pushed
from
53c564d0
to
32a9ff90
91 days ago
Add ZenFlow optimizers (zero stage 1&2) for ZeRO integration
3309b49d
Add ZenFlowConfig for optimizer configuration
4e9fe2a2
Add ZenFlow (zero stage 1&2) integration in DeepSpeedEngine
cac5703c
Add unit tests for ZenFlowConfig
0e9a0c9e
Fix initialization and update logic for ZenFlow optimizers
3353e34a
Add unit tests for ZenFlowSelectiveAdamW optimizer
28cdf89e
Add ZenFlow tutorial documentation
f534d5e3
Format code
80ad4889
Fix check_grad_overflow parameter in ZenFlowZeroOptimizer
9c05ccba
Refactor ZenFlowZeroOptimizer methods to include communication data type
da80ff75
Merge remote-tracking branch 'upstream/master' into zenflow_zero1_2
417932ae
Refactor ZenFlow integration in DeepSpeedEngine
fee24ffb
Antlera
force pushed
from
611fbe84
to
fee24ffb
90 days ago
Refactor ZenFlow function callings in DeepSpeedEngine
a528fd47
Merge branch 'master' into zenflow_zero1_2
9aac3c0b
Fix bugs in ZenFlow + ZeRO Stage 1 and gradient reduction logic
f7bc35d7
Add unit tests for ZenFlow with ZeRO Stage 1 and 2
3638d789
Merge branch 'zenflow_zero1_2' of github.com:Antlera/DeepSpeed into z…
fad8498d
Refactor ZenFlow integration using seperate engine file
6d683302
Fix missing `[comm_dtype]` and format code
913f9a7d
Merge branch 'master' into zenflow_zero1_2
6b8c82ab
Update CPUADAM core range calculation in zenflow_stage_1_and_2.py
bce0a7f8
Merge branch 'zenflow_zero1_2' of github.com:Antlera/DeepSpeed into c…
6f51348b
Fix bugs in ZenFlow unit tests
0ef3fafb
Merge branch 'master' into zenflow_zero1_2
a6235566
Merge remote-tracking branch 'origin/zenflow_zero1_2' into clr_branch…
b898eafe
Merge branch 'zenflow_zero1_2' of github.com:Antlera/DeepSpeed into c…
e2a2b816
Fix: Add PyTorch version check for ZenFlow configuration
8d6b6f34
Merge branch 'master' into zenflow_zero1_2
1e70efab
Enhance ZenFlow compatibility checks for PyTorch version
891ac093
Merge branch 'zenflow_zero1_2' of github.com:Antlera/DeepSpeed into c…
0d7d0864
Merge branch 'master' into zenflow_zero1_2
da902eb6
Fix bugs in ZenFlow unit tests when using CPU Torch
d2d1a06e
Merge branch 'master' into zenflow_zero1_2
4cb3178a
Merge branch 'master' into zenflow_zero1_2
4d1db6d4
Added TODO comments to indicate the need for removing ZenFlow-specifi…
f3b22769
Merge branch 'zenflow_zero1_2' of github.com:Antlera/DeepSpeed into c…
e48622c5
Fix formatting in test_zf.py
bbb6f744
Update docs/_tutorials/zenflow.md
9f4fb585
Antlera
force pushed
from
6ecbc01d
to
9f4fb585
55 days ago
Merge branch 'master' into zenflow_zero1_2
df701501
delock
commented
on 2025-08-07
delock
commented
on 2025-08-08
Merge branch 'master' into zenflow_zero1_2
29c5f282
Merge branch 'master' into zenflow_zero1_2
dc505a75
Fix copyrights.
938e8a3c
Remove CUDA specific code.
8951fa08
Merge branch 'master' into zenflow_zero1_2
ef166ad9
Merge branch 'master' into zenflow_zero1_2
b849af66
Merge branch 'master' into zenflow_zero1_2
0593fc21
Merge branch 'master' into zenflow_zero1_2
671c5688
Merge branch 'master' into zenflow_zero1_2
8ed876cb
Merge branch 'master' into zenflow_zero1_2
900491e7
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub