Improve overflow handling in ZeRO #6976
Improve overflow handling in ZeRO
a3a18f72
Unit test and pydantic configuration
19431f80
Formatting fixes
406cf26f
Merge branch 'master' into olruwase/ds_5241
35570f54
Remove unused symbol
cb784448
Fix typo
ee1c1fd0
Pydantic fp16 config
0b2cf73a
Fix more typos
c7a90f9f
Address #4986
3694e07d
Merge branch 'master' into olruwase/ds_5241
2bbcf00f
Merge branch 'master' into olruwase/ds_5241
c1b87ead
Merge branch 'master' into olruwase/ds_5241
5da6cd0f
Merge branch 'master' into olruwase/ds_5241
a65d20c9
Fix typo
ae039b29
Merge branch 'olruwase/ds_5241' of github.com:microsoft/DeepSpeed int…
04461922
Merge branch 'master' into olruwase/ds_5241
5d48745d
Merge branch 'master' into olruwase/ds_5241
05c362d9
Merge branch 'master' into olruwase/ds_5241
5e17ed67
Merge branch 'master' into olruwase/ds_5241
06bb3a61
Fix min loss scale
0d0ab3d4
Merge branch 'master' into olruwase/ds_5241
cccd5b11
Fix UTs
2c6f6307
Merge branch 'olruwase/ds_5241' of github.com:microsoft/DeepSpeed int…
21bfca08
Merge branch 'master' into olruwase/ds_5241
5fe58101
Using explicit GPU upcast for ZeRO-Offload (#6962)
732ceb7c
Update version.txt after 0.16.3 release (#6965)
db9aff9f
Precisely track nvme optimizer offload (#6963)
4edeb033
Update build_win.bat script to exclue GDS op as it lacks Windows supp…
f00f4ea5
Improve overflow handling in ZeRO
c3846faa
Unit test and pydantic configuration
7d56ffa9
Formatting fixes
6ca11efa
Add CUDA 12.8 support and comment on CUDA 12.7 (#6975)
49f3df86
Update torch versions to support 2.6 (#6977)
8364b125
Remove unused symbol
ea9b4732
Fix typo
d2425a2a
Pydantic fp16 config
7d5be078
Fix more typos
e8fc098a
Address #4986
2bbb7b4f
generalize deepspeed linear and implement it for non cuda systems (#6…
3ab5e885
Fix typo
271db941
Update recommended Windows whl building versions (#6983)
b1900af1
Title: Fix setup_env_ranks to Properly Set Environment Variables Inst…
e3d10e5a
Specify torchvision in nv-ds-chat workflow (prevents errors with torc…
b8d8e390
Remove assumption that padding only occurs on last rank (#6974)
fde7df1f
Use ds-specific module id to avoid conflicts (#6847)
b0b01321
Update A6000 workflows to use newer docker container - 24.09 vs 24.03…
353ab08b
Allow NVIDIA Blackwell (#6991)
14189a72
Update GH org references (#6998)
75996f89
Fix min loss scale
b23c545c
Fix UTs
7cd3a9f9
Update CNAME
2c5629e0
Update CNAME
6b156883
[XPU] max1100 workflow update for docker and softwares (#7003)
3773d837
autotp training(fix dco) (#7004)
64c4b04c
Merge branch 'olruwase/ds_5241' of github.com:microsoft/DeepSpeed int…
5fa29105
Merge branch 'master' into olruwase/ds_5241
1f5a672a
Fix ds-chat CI regression
98821161
Merge branch 'olruwase/ds_7014' of github.com:microsoft/DeepSpeed int…
97d79158
Fix bug
4a1dd0fc
Avoid naming collision on partition()
0ac44574
Merge branch 'master' into olruwase/ds_5241
1597d48b
Use new API
2ae20626
Merge branch 'master' into olruwase/ds_7014
9fb73a4d
Merge branch 'olruwase/ds_7014' of github.com:microsoft/DeepSpeed int…
26fa8af3
Merge branch 'olruwase/ds_5241' of github.com:microsoft/DeepSpeed int…
b565d778
Merge branch 'master' into olruwase/ds_5241
d098c322
Merge branch 'master' into olruwase/ds_5241
990a5ad8
Merge branch 'master' into olruwase/ds_5241
9b1b030b
Merge branch 'master' into olruwase/ds_5241
1953c38f
Code cleanup
2ea182ef
Merge branch 'olruwase/ds_5241' of github.com:microsoft/DeepSpeed int…
9aff2087
Merge branch 'master' into olruwase/ds_5241
36c55d24
Merge branch 'master' into olruwase/ds_5241
80fcb83b
Merge branch 'master' into olruwase/ds_5241
776385fc
Merge branch 'master' into olruwase/ds_5241
e5f64af1
Use new dlpack api; Formatting fixes
61685dc0
Merge branch 'olruwase/new_dlpack_api' of github.com:microsoft/DeepSp…
75ac86cf
Merge branch 'master' into olruwase/ds_5241
6b9736c5
Triage pytest --forked cupy failure
83850adb
Merge branch 'olruwase/ds_5241' of github.com:microsoft/DeepSpeed int…
4d56c995
Revert pytest debugging
5e76c7dd
Merge branch 'master' into olruwase/ds_5241
a59cb55f
Merge branch 'master' into olruwase/ds_5241
f10a2f21
Merge branch 'master' into olruwase/ds_5241
919f5385
Merge branch 'master' of github.com:microsoft/DeepSpeed into olruwase…
4b583262
Merge branch 'olruwase/ds_5241' of github.com:microsoft/DeepSpeed int…
75203d72
UT workaround
08a07cbc
Merge branch 'master' into olruwase/ds_5241
728dd387
Merge branch 'master' into olruwase/ds_5241
2ac92112
Merge branch 'master' into olruwase/ds_5241
2d6913a1
Merge branch 'master' into olruwase/ds_5241
55395db3
Merge branch 'master' into olruwase/ds_5241
1f38d597
Merge branch 'master' into olruwase/ds_5241
58e61a04
Merge branch 'master' into olruwase/ds_5241
fa30042b
Merge branch 'master' into olruwase/ds_5241
e76bfd83
Merge branch 'master' into olruwase/ds_5241
9b4289a1
Merge branch 'master' into olruwase/ds_5241
16bcd901
Merge branch 'master' into olruwase/ds_5241
6373a578
loadams
approved these changes
on 2025-06-09
Merge branch 'master' into olruwase/ds_5241
99f356dd
loadams
enabled auto-merge (squash) 201 days ago
loadams
merged
e440506b
into master 201 days ago
loadams
deleted the olruwase/ds_5241 branch 201 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub