DeepSpeed
Add Zenflow code for Stage 1 & 2
#7391
Merged

Add Zenflow code for Stage 1 & 2 #7391

Antlera
Antlera Antlera requested a review from tjruwase tjruwase 91 days ago
Antlera Antlera requested a review from tohtana tohtana 91 days ago
Antlera Antlera requested a review from loadams loadams 91 days ago
Antlera Antlera force pushed from 32a9ff90 to 53c564d0 91 days ago
Antlera Antlera requested a review from jomayeri jomayeri 91 days ago
Antlera Antlera requested a review from hwchen2017 hwchen2017 91 days ago
Antlera Antlera requested a review from GuanhuaWang GuanhuaWang 91 days ago
Antlera Antlera force pushed from 53c564d0 to 32a9ff90 91 days ago
tohtana
tohtana commented on 2025-06-27
Antlera
tohtana
Antlera Add ZenFlow optimizers (zero stage 1&2) for ZeRO integration
3309b49d
Antlera Add ZenFlowConfig for optimizer configuration
4e9fe2a2
Antlera Add ZenFlow (zero stage 1&2) integration in DeepSpeedEngine
cac5703c
Antlera Add unit tests for ZenFlowConfig
0e9a0c9e
Antlera Fix initialization and update logic for ZenFlow optimizers
3353e34a
Antlera Add unit tests for ZenFlowSelectiveAdamW optimizer
28cdf89e
Antlera Add ZenFlow tutorial documentation
f534d5e3
Antlera Format code
80ad4889
Antlera Fix check_grad_overflow parameter in ZenFlowZeroOptimizer
9c05ccba
Antlera Refactor ZenFlowZeroOptimizer methods to include communication data type
da80ff75
Antlera Merge remote-tracking branch 'upstream/master' into zenflow_zero1_2
417932ae
Antlera Refactor ZenFlow integration in DeepSpeedEngine
fee24ffb
Antlera Antlera force pushed from 611fbe84 to fee24ffb 90 days ago
Antlera Refactor ZenFlow function callings in DeepSpeedEngine
a528fd47
tohtana
tohtana commented on 2025-06-30
tohtana Merge branch 'master' into zenflow_zero1_2
9aac3c0b
JoshWoo2003 Fix bugs in ZenFlow + ZeRO Stage 1 and gradient reduction logic
f7bc35d7
JoshWoo2003 Add unit tests for ZenFlow with ZeRO Stage 1 and 2
3638d789
Antlera Merge branch 'zenflow_zero1_2' of github.com:Antlera/DeepSpeed into z…
fad8498d
Antlera Refactor ZenFlow integration using seperate engine file
6d683302
Antlera Fix missing `[comm_dtype]` and format code
913f9a7d
tohtana Merge branch 'master' into zenflow_zero1_2
6b8c82ab
Antlera
Antlera
tohtana
Antlera
tohtana
Antlera
Antlera Update CPUADAM core range calculation in zenflow_stage_1_and_2.py
bce0a7f8
Antlera Merge branch 'zenflow_zero1_2' of github.com:Antlera/DeepSpeed into c…
6f51348b
JoshWoo2003 Fix bugs in ZenFlow unit tests
0ef3fafb
sfc-gh-truwase Merge branch 'master' into zenflow_zero1_2
a6235566
Antlera Merge remote-tracking branch 'origin/zenflow_zero1_2' into clr_branch…
b898eafe
Antlera Merge branch 'zenflow_zero1_2' of github.com:Antlera/DeepSpeed into c…
e2a2b816
Antlera Fix: Add PyTorch version check for ZenFlow configuration
8d6b6f34
Antlera
Antlera
tohtana Merge branch 'master' into zenflow_zero1_2
1e70efab
Antlera Enhance ZenFlow compatibility checks for PyTorch version
891ac093
Antlera Merge branch 'zenflow_zero1_2' of github.com:Antlera/DeepSpeed into c…
0d7d0864
Antlera
JoshWoo2003
tohtana
tohtana
loadams Merge branch 'master' into zenflow_zero1_2
da902eb6
JoshWoo2003 Fix bugs in ZenFlow unit tests when using CPU Torch
d2d1a06e
JoshWoo2003
sfc-gh-truwase Merge branch 'master' into zenflow_zero1_2
4cb3178a
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-02
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-02
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-02
tjruwase Merge branch 'master' into zenflow_zero1_2
4d1db6d4
Antlera Added TODO comments to indicate the need for removing ZenFlow-specifi…
f3b22769
Antlera Merge branch 'zenflow_zero1_2' of github.com:Antlera/DeepSpeed into c…
e48622c5
Antlera Fix formatting in test_zf.py
bbb6f744
Antlera Update docs/_tutorials/zenflow.md
9f4fb585
Antlera Antlera force pushed from 6ecbc01d to 9f4fb585 55 days ago
sfc-gh-truwase Merge branch 'master' into zenflow_zero1_2
df701501
delock
delock commented on 2025-08-07
delock
delock commented on 2025-08-08
Antlera Merge branch 'master' into zenflow_zero1_2
29c5f282
sfc-gh-truwase Merge branch 'master' into zenflow_zero1_2
dc505a75
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-10
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-10
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-10
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-10
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-10
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-10
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-10
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-10
sfc-gh-truwase
sfc-gh-truwase commented on 2025-08-10
Antlera Fix copyrights.
938e8a3c
Antlera
Antlera Remove CUDA specific code.
8951fa08
sfc-gh-truwase
sfc-gh-truwase
sfc-gh-truwase approved these changes on 2025-08-11
Antlera
sfc-gh-truwase
Antlera Merge branch 'master' into zenflow_zero1_2
ef166ad9
Antlera
sfc-gh-truwase Merge branch 'master' into zenflow_zero1_2
b849af66
Antlera
Antlera
Antlera Merge branch 'master' into zenflow_zero1_2
0593fc21
Antlera
tjruwase Merge branch 'master' into zenflow_zero1_2
671c5688
sfc-gh-truwase Merge branch 'master' into zenflow_zero1_2
8ed876cb
sfc-gh-truwase sfc-gh-truwase enabled auto-merge (squash) 42 days ago
sfc-gh-truwase Merge branch 'master' into zenflow_zero1_2
900491e7
sfc-gh-truwase sfc-gh-truwase merged 1d7b90ad into master 42 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone