Pull Requests microsoft/DeepSpeed

Fix zero/division safety gaps in utility and inference paths

#7855 opened 2026-02-17 18:05 by harshang03

Fix count_used_parameters_in_backward crash on PyTorch < 2.3 (#7756)

#7849 opened 2026-02-12 20:06 by harshang03

[BUG] Fix: Fix gradient norm calculation and dynamic shape blocking in PP+ZeRO1 collective communication

#7847 opened 2026-02-12 06:54 by Thinksky5124

Fix bf16 dtype mismatch in ZeRO-3 with zero_quantized_weights

#7792 opened 2026-01-18 05:04 by juyterman1000

Fix Muon optimizer conflict with gradient clipping in ZeRO 1/2

#7776 opened 2026-01-12 11:44 by fy817

Fix Muon optimizer checkpoint resume with bf16 mode

#7748 opened 2025-12-28 22:21 by yurekami

Introduce Megatron-style parallel state management

#7726 opened 2025-12-15 12:40 by eternalNight

let allgather and alltoall execute in parallel when both attention and MOE used TP

#7723 opened 2025-12-11 07:51 by taozhiwei

fix: When there are tensors registered with register buffer in the weight file, the weights are only loaded on device 0 when loading weights across multiple devices.

#7717 opened 2025-12-08 03:57 by KeeProMise

if no expert found in parameter that have expert in name the loop should continue

#7685 opened 2025-11-11 19:28 by LckyLke

Configures workflow for offline unit tests

#7512 opened 2025-08-24 16:22 by porfanid

Add world-size getter in Engine

#7479 opened 2025-08-09 09:01 by WoosungMyung

Add EXAONE 4.0 model support for DeepSpeed inference v2 @

#7456 opened 2025-07-29 01:48 by notkisk

Create COMMITTERS_RESPONSIBILITY.md

#7300 opened 2025-05-21 14:25 by PKUWZP

HF2UCP: Converting a `pytorch_model.bin` or `.safetensors` checkpoint to UCP

#7212 opened 2025-04-10 10:13 by Schwidola0607

gather output layout support for column parallel

#7181 opened 2025-03-28 03:18 by inkcherry

[bugfix] update results of state_dict loading, embedding resizing to secondary partitions (hpz)

#7130 opened 2025-03-11 08:54 by cyr0930

[Draft] Add support for seq split in Domino

#7111 opened 2025-03-04 21:19 by duanhx1037

Update Domino for Llama3

#7084 opened 2025-02-26 20:08 by shenzheyu

Fix, pipeline model with moe cause error when send grad

#7055 opened 2025-02-19 11:53 by wukong1992

Add `pyproject.toml` with legacy build backend to keep most logic in `setup.py`

#7033 opened 2025-02-13 18:10 by loadams

Enabled high-performance Automatic Tensor Parallelism (auto TP) for the MoE models on multiple GPUs/HPUs

#6964 opened 2025-01-21 08:18 by gyou2021

[FPDT] Support FPDT Based on Intel Backend

#6956 opened 2025-01-16 08:38 by YizhouZ

Update sharded_moe.py to support top2 gate with Tutel

#6948 opened 2025-01-14 20:11 by xenshinu

Fix: forbid repeated deepspeed.initialize on training objects

#6874 opened 2024-12-16 00:18 by traincheck-team

Training ops kernels: Speeding up the Llama-based MoE architectures

#6734 opened 2024-11-08 23:21 by RezaYazdaniAminabadi

Update MII tests to support transformers latest

#6686 opened 2024-10-29 17:27 by loadams

Support the parallel conversion from ZeRO checkpoints to FP32/FP16/BF16 param weight

#6655 opened 2024-10-23 03:51 by xylian86

modify_load_save_model

#6626 opened 2024-10-15 03:22 by ssklzx

Improve consistency of zero_grad

#6554 opened 2024-09-18 20:27 by tohtana