Enabled high-performance Automatic Tensor Parallelism (auto TP) for the MoE models on multiple GPUs/HPUs #6964
delock
commented
on 2025-01-21
Reduced the experts allreduce number per layer to ONCE for the Qwen2-…
c9b12af9
Fixed format
590ea36a
Removed print
889c2750
Fix a bug about set.
2ec6c347
Add the missing view operations from sequence parallel(async). (#6750)
504d696f
Update `torch.norm` to `torch.linalg.norm` and `torch.linalg.vector_n…
c266dc98
Using explicit GPU upcast for ZeRO-Offload (#6962)
ae129935
Update version.txt after 0.16.3 release (#6965)
deb09a3b
Precisely track nvme optimizer offload (#6963)
128d436e
Update build_win.bat script to exclue GDS op as it lacks Windows supp…
864472b3
Add CUDA 12.8 support and comment on CUDA 12.7 (#6975)
1ac398c1
Update torch versions to support 2.6 (#6977)
eda53d8b
generalize deepspeed linear and implement it for non cuda systems (#6…
112a7c6a
Update recommended Windows whl building versions (#6983)
7d2c5fec
Title: Fix setup_env_ranks to Properly Set Environment Variables Inst…
f1d326c2
Specify torchvision in nv-ds-chat workflow (prevents errors with torc…
46545d77
Remove assumption that padding only occurs on last rank (#6974)
af1ba94e
Use ds-specific module id to avoid conflicts (#6847)
e235921f
Update A6000 workflows to use newer docker container - 24.09 vs 24.03…
f5e97963
Allow NVIDIA Blackwell (#6991)
07634b96
Update GH org references (#6998)
0e57fa02
Update CNAME
e86c0c30
Update CNAME
0d7f0eb0
[XPU] max1100 workflow update for docker and softwares (#7003)
cd8a9887
autotp training(fix dco) (#7004)
18c712fc
import triton files when triton is supported and installed (#6989)
c5bf6f64
Update A6000 tests transformers version (#7016)
590de5fe
Fix ds-chat CI regression (#7015)
693c39ff
[Ulysses tutorial] typos (#7024)
322a05a6
fix hostname -I for macOS #6497 (#6990)
8869d789
Update workflows to cuda 12.4 (#7000)
e4d03af5
[ROCm] Enable fp_quantizer on ROCm (#7027)
8c6251da
add gds chinese blog (#7034)
e3e179ca
Add chinese blog for deepspeed windows, and fix format (#7035)
fd2787b3
AIO on ROCM (#7023)
ba8ef574
Control trace cache warnings (#7039)
f4b0f586
Update CUDA compute capability to support Blackwell (#7047)
3ca3e2fb
Update setup.py handling of ROCm cupy (#7051)
56127786
nv-ds-chat breaks with latest transformers (#7052)
af8c1900
Rename aio_thread_count to intra_op_parallelism (#7056)
225471ad
add autoTP training zero2 tests (#7049)
1df293a6
Fix, bf16 optimizer remove dup loop (#7054)
94abf682
Update version.txt after 0.16.4 release (#7063)
4a4ff9ba
fix an outdated doc wrt CUDA_VISIBLE_DEVICES (#7058)
e5eda47f
Tecorigin sdaa accelerator (#6903)
675ec9af
Handle special case of libuv for Windows (#7064)
81c1fee8
Update README with info on newest accelerator (#7065)
17f544cb
Bug Fix for offload_states API (#7050)
20fd872c
Fix TOCTOU issues, switch to fstat (#7067)
0b289a26
config torch to avoid graph breaks caused by logger (#6999)
4a86d02e
Fix meta load tensor imcompatible issue (#7073)
594b5bb1
Replace calls to `python setup.py sdist` with `python -m build --sdis…
a843e399
Revert "Handle special case of libuv for Windows (#7064)" (#7076)
4cbc52c0
Add DeepseekV3 AutoTP. (#7045)
586e4366
Improve inference tutorial docs (#7083)
5e379ada
Added support for the environment variable DS_MOE_EXPERTS_REDUCE_ONCE…
13bf8662
Changed env variable name to 'DS_MOE_TP_SINGLE_ALLREDUCE'
d5115bed
Pin transformers version on tests that use latest. (#7085)
f0044cbc
Update README.md with ICS '23 MoE paper link (#7087)
16ad5fd7
Update parallelism for nv-torch-latest/nightly tests due to more GPUs…
47d4420f
Remove workflows for very old torch versions (#7090)
b3c64dd3
gyou2021
force pushed
from
f3c6b431
to
b3c64dd3
209 days ago
gyou2021
changed the title Enabled high-performance Automatic Tensor Parallelism (auto TP) for the Qwen2-MoE and DeepSeek-V2 models on multiple GPUs/HPUs Enabled high-performance Automatic Tensor Parallelism (auto TP) for the MoE models on multiple GPUs/HPUs 209 days ago
Fixed conflicts
9b1fe98d
Update auto_tp.py
6b96dd9e
Merge branch 'master' into autoTP_Qwen2Moe_DeepSeekv2
e7883e7a
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub