Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
microsoft/DeepSpeed
Pull Requests
Commits
Open
Closed
[Bloom] Fix hangs of bloom test
#7890 opened 2026-03-06 19:03 by
k-artem
fix: Validate fp16.loss_scale is finite and non-negative
#7889 opened 2026-03-06 07:44 by
nathon-lee
Fix Stage 0 + Ulysses crash: make bwc_tensor_model_parallel_rank() resilient to MP API absence
#7888 opened 2026-03-06 06:59 by
nathon-lee
[SP] add SP deny list instead of allow
#7887 opened 2026-03-05 13:32 by
kashif
Fix Evoformer's multi-arch dispatch root cause
#7881 opened 2026-03-02 19:45 by
tohtana
fix(zero): Ensure full gradient reduction for Muon optimizer with reduce_scatter
#7878 opened 2026-02-27 06:46 by
nathon-lee
fix: correct DistributedAttention output shape and pad uneven sequence lengths (#7842)
#7868 opened 2026-02-22 11:00 by
harshang03
fix: keep fp32-pinned parameters out of the bf16 cast path in ZeRO-3 (#7747)
#7867 opened 2026-02-22 10:52 by
harshang03
Revert "fix: remove premature MPI environment variable check in OpenMPIRunner"
#7864 opened 2026-02-21 01:39 by
mikloorbi-sys
[WIP] Merging AutoSP into DeepSpeed
#7860 opened 2026-02-19 06:17 by
neeldani
Fix global .cuh ignore and enforce tracked CUDA headers
#7858 opened 2026-02-18 04:38 by
harshang03
Fix ZeRO legacy grad-hook crash when next_functions is missing
#7857 opened 2026-02-17 22:07 by
harshang03
Reject non-finite fp16 loss_scale across config and ZeRO paths
#7856 opened 2026-02-17 18:13 by
harshang03
Fix zero/division safety gaps in utility and inference paths
#7855 opened 2026-02-17 18:05 by
harshang03
Fix count_used_parameters_in_backward crash on PyTorch < 2.3 (#7756)
#7849 opened 2026-02-12 20:06 by
harshang03
[BUG] Fix: Fix gradient norm calculation and dynamic shape blocking in PP+ZeRO1 collective communication
#7847 opened 2026-02-12 06:54 by
Thinksky5124
Fix subgroup optimizer metadata inconsistency
#7820 opened 2026-01-27 11:19 by
st-bang97
[Draft] Muon Optimizer Support for ZeRO3
#7798 opened 2026-01-20 03:49 by
PKUWZP
Fix bf16 dtype mismatch in ZeRO-3 with zero_quantized_weights
#7792 opened 2026-01-18 05:04 by
juyterman1000
Fix Muon optimizer conflict with gradient clipping in ZeRO 1/2
#7776 opened 2026-01-12 11:44 by
fy817
Fix: ZenFlow Adam integration for updated PyTorch backward flow (#7759)
#7771 opened 2026-01-11 06:48 by
Antlera
Fix Muon optimizer checkpoint resume with bf16 mode
#7748 opened 2025-12-28 22:21 by
yurekami
Introduce Megatron-style parallel state management
#7726 opened 2025-12-15 12:40 by
eternalNight
let allgather and alltoall execute in parallel when both attention and MOE used TP
#7723 opened 2025-12-11 07:51 by
taozhiwei
fix: When there are tensors registered with register buffer in the weight file, the weights are only loaded on device 0 when loading weights across multiple devices.
#7717 opened 2025-12-08 03:57 by
KeeProMise
if no expert found in parameter that have expert in name the loop should continue
#7685 opened 2025-11-11 19:28 by
LckyLke
Configures workflow for offline unit tests
#7512 opened 2025-08-24 16:22 by
porfanid
Add world-size getter in Engine
#7479 opened 2025-08-09 09:01 by
WoosungMyung
Add EXAONE 4.0 model support for DeepSpeed inference v2 @
#7456 opened 2025-07-29 01:48 by
notkisk
Create COMMITTERS_RESPONSIBILITY.md
#7300 opened 2025-05-21 14:25 by
PKUWZP
Older