Pull Requests deepspeedai/DeepSpeed

Fix deadlock in PipeEngine._exec_recv_grads

#5518 opened 2024-05-10 02:45 by i4never

Make the quantized data shape compatible with original tensor shape

#5483 opened 2024-04-30 05:05 by sfc-gh-reyazda

uniform deepspeed overflow check

#5424 opened 2024-04-16 22:40 by GuanhuaWang

Adding DS Feature API in accelerator

#5423 opened 2024-04-16 20:54 by duli2012

Update names of CPU Adam/Adagrad/Lion params to better match torch/GPU ops.

#5382 opened 2024-04-08 21:56 by loadams

Disable compile for Z3 hook function

#5325 opened 2024-03-28 00:38 by tohtana

Disable torch.nn.init when counting parmeters in initializing PipelineModule

#5258 opened 2024-03-12 06:07 by tanconghui

Set tp to 1 when MPU is None for bf16 optimizer

#5245 opened 2024-03-08 23:42 by samadejacobs

apply reduce_scatter_coalesced op

#5224 opened 2024-03-04 12:50 by inkcherry

Loadams/cpu inf v0 docker

#5137 opened 2024-02-15 18:51 by loadams

move CPU_Accelerator --> Xeon_Accelerator

#5126 opened 2024-02-13 18:25 by mrwyattii

Add HIP device abstraction, update Triton skip logic

#5120 opened 2024-02-12 20:42 by lekurile

TEST: PR HIP-ifying and running the bias_activations kernel on AMD

#5082 opened 2024-02-05 18:52 by lekurile

Workflow for AutoTP

#4961 opened 2024-01-16 10:05 by delock

Support Triton 2.2+

#4937 opened 2024-01-11 17:45 by loadams

Add Cache to Comm Group

#4849 opened 2023-12-20 22:18 by cmikeh2

Retrieve CUDA available memory via `torch.cuda.mem_get_info()`

#4847 opened 2023-12-20 14:41 by XuehaiPan

Support FP16 CpuAdam + Zero Stage 3

#4771 opened 2023-12-04 21:39 by lz1oceani

support autoTP with weight only quantization in DS inference path

#4750 opened 2023-11-29 05:54 by ftian1

SP Comm-optimization: fuse query, key, and value all-2-all for better SP perforamnce

#4735 opened 2023-11-28 00:45 by RezaYazdaniAminabadi

Add simple layout for creating multi-dimensional parallelism

#4706 opened 2023-11-20 17:00 by RezaYazdaniAminabadi

Add more weight only quantization algorithms into DeepSpeed inference.

#4577 opened 2023-10-27 06:00 by ftian1

Fixed bug with hybrid engine generation when inference_tp_size > 1

#4493 opened 2023-10-10 07:55 by hxdtest

Fix assert on Lamb optimizers with BF16

#4451 opened 2023-10-04 17:50 by loadams

Destroy ZeRO

#4383 opened 2023-09-21 21:35 by jomayeri

DS-Inference Quantization refresh: Fix several issues and add more features

#4351 opened 2023-09-17 18:43 by RezaYazdaniAminabadi

Switch modeling to use transformers and torch version for BERT

#4329 opened 2023-09-13 23:28 by loadams

Remove symlinks

#4323 opened 2023-09-13 18:41 by mrwyattii

Refactor the injection to accept policy when using kernels

#4267 opened 2023-09-05 17:38 by RezaYazdaniAminabadi

fix: fixed the communication problem of pp when using sequence parallel

#4228 opened 2023-08-28 07:53 by LiuXTao