Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
deepspeedai/DeepSpeed
Pull Requests
Commits
Open
Closed
Revert "Add index to HPU devices (#7497)"
#7545 opened 2025-09-04 13:43 by
deepcharm
[MoE] Fix misuse of num_experts as expert parallel group size (ep_size)
#7537 opened 2025-09-02 17:27 by
Flakes342
Add ZenFlow code for Stage 3
#7516 opened 2025-08-26 02:07 by
JoshWoo2003
Configures workflow for offline unit tests
#7512 opened 2025-08-24 16:22 by
porfanid
DeepCompile ZeRO-3: robust allgather for uneven shards; fix profiling…
#7489 opened 2025-08-15 00:30 by
juyterman1000
Add world-size getter in Engine
#7479 opened 2025-08-09 09:01 by
WoosungMyung
Add EXAONE 4.0 model support for DeepSpeed inference v2 @
#7456 opened 2025-07-29 01:48 by
notkisk
[AMD][ROCm] Improve support of AMD
#7448 opened 2025-07-24 11:45 by
k-artem
Create COMMITTERS_RESPONSIBILITY.md
#7300 opened 2025-05-21 14:25 by
PKUWZP
HF2UCP: Converting a `pytorch_model.bin` or `.safetensors` checkpoint to UCP
#7212 opened 2025-04-10 10:13 by
Schwidola0607
gather output layout support for column parallel
#7181 opened 2025-03-28 03:18 by
inkcherry
Add DataStates-LLM: Asynchronous Checkpointing Engine Support
#7166 opened 2025-03-21 19:14 by
mauryaavinash95
[bugfix] update results of state_dict loading, embedding resizing to secondary partitions (hpz)
#7130 opened 2025-03-11 08:54 by
cyr0930
[Draft] Add support for seq split in Domino
#7111 opened 2025-03-04 21:19 by
duanhx1037
Update Domino for Llama3
#7084 opened 2025-02-26 20:08 by
shenzheyu
Fix, pipeline model with moe cause error when send grad
#7055 opened 2025-02-19 11:53 by
wukong1992
Add `pyproject.toml` with legacy build backend to keep most logic in `setup.py`
#7033 opened 2025-02-13 18:10 by
loadams
Enable python 3.11 and 3.12 tests
#7007 opened 2025-02-06 00:03 by
loadams
Enabled high-performance Automatic Tensor Parallelism (auto TP) for the MoE models on multiple GPUs/HPUs
#6964 opened 2025-01-21 08:18 by
gyou2021
[FPDT] Support FPDT Based on Intel Backend
#6956 opened 2025-01-16 08:38 by
YizhouZ
Update sharded_moe.py to support top2 gate with Tutel
#6948 opened 2025-01-14 20:11 by
xenshinu
Fix: forbid repeated deepspeed.initialize on training objects
#6874 opened 2024-12-16 00:18 by
traincheck-team
Training ops kernels: Speeding up the Llama-based MoE architectures
#6734 opened 2024-11-08 23:21 by
RezaYazdaniAminabadi
Update MII tests to support transformers latest
#6686 opened 2024-10-29 17:27 by
loadams
Support the parallel conversion from ZeRO checkpoints to FP32/FP16/BF16 param weight
#6655 opened 2024-10-23 03:51 by
xylian86
modify_load_save_model
#6626 opened 2024-10-15 03:22 by
ssklzx
Improve consistency of zero_grad
#6554 opened 2024-09-18 20:27 by
tohtana
Enabled configurable auto Tensor Parallelism (TP) for the inference of diverse models
#6553 opened 2024-09-18 12:25 by
gyou2021
Unpin tests that previously used a pinned version of transformers
#6387 opened 2024-08-20 21:16 by
loadams
Hybrid Offloading for ZeRO3
#5625 opened 2024-06-07 01:45 by
tohtana
Older