Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
microsoft/DeepSpeed
Pull Requests
Commits
mrwyattii/silence-backend-warning
AutoPR/0.12.2
AutoPR/0.14.0
AutoPR/0.14.5
CUDA-Graph-support
HeyangQin/deepspeed-ulysses-chinese-blog
HeyangQin/enable_hpz_nograd
HeyangQin/fastgen_moe_h100
HeyangQin/fix_hpz_nograd
HeyangQin/fix_issue_3062
HeyangQin/fix_issue_3068
HeyangQin/fix_issue_3156
HeyangQin/fix_issue_5205
HeyangQin/fix_pr_3462_standalone
HeyangQin/hpz_convergence
HeyangQin/inference_t5_phase1
HeyangQin/mixed_precision_lora_sam
HeyangQin/mixz_tutorial
HeyangQin/skip_bias_quant
HeyangQin/staging-zero-pp-v1
HeyangQin/ucp_blog_chinese
HeyangQin/ulysses_fp8
Megtron-Kernel-Integration
SA_feature_tag
SA_tutorial_update
SA_update_tutorial_link
add-bfp16-support
add-comm-layout
add-inference-comm
add-llama2-support
add-quantizer
add-shared-lib
adk9/phi3-inference
adk9/phi3-small
adk9/update-minor-cuda
amawa/add-moe-container
amawa/aml-get-hosts
amawa/auto-save-ckpt
amawa/config-pass-down
amawa/debug
amawa/fix-amd-rocm
amawa/fix-auto-tp-load-ckpt
amawa/fix-tracer-zero3
amawa/fix-z3-for-hf-accelerate
amawa/fix-z3-warn-print-v2
amawa/inference-fix
amawa/remove-deepcopy
amawa/split-a2a
amawa/zero-inf-refactor
amawa/1-bit-alltoall
amawa/1bit-adam-nccl
amd-jiting
aml-autotuner
arashb/fix-phi-2
arashb-patch-1
arpan/auto-check
autocast-fix
awan-10-patch-1
awan-10-patch-2
awan-10-patch-3
azure
big-science
big-science-v2
bing/debugging
bing/ds-adam
bing/formatting-correction
bing/io-tutorial
bing/modify-ds-optimizer
bing/optimizer-naming
bloom-debug
chatgpt-chinese-blog
check-linear-sizes
cholmes/activation-utils
cholmes/checkpoints-inference-v2-2
cholmes/comm-group-cache
cholmes/fix_reduction_utils_amd
cholmes/fix-asym-quant
cholmes/isolate-src-code
cholmes/kv-cache-flexibility
cholmes/mem-access-predicated-load
cholmes/migrate-to-dequant-lib
cholmes/pipelined-quant
cholmes/reduce-quantized-gpus
cholmes/sd-extension
cholmes/ts-builder
cholmes/unique-cuda-graphs
ckpt-fix-unfused
clean-llama
clean-llama-v2
clean-opt
clean-opt-base
clean-opt-v2-base
clean-opt-v2
codegen-inference
comm-opt2
costineseanu/windows_inference_build
cpu-adam/optional_CUDA-copy
debug-base-attn
debug-ds-inf
debug-ds-inf-torch-matmul
ds-chat-blog-8-31
ds-chat-clean-opt
ds-chat-news
ds-chat-release
ds-inference/add-falcon-support
ds-inference/bloom-support-meta
ds-inference/fix-generation
ds-inference/fix-mp
ds-inference/remove-randgen
ds-inference/simplify
ds-inference/support-large-token-length
ds-seq-tutorial
ds-vchat-blog-v1
ds-vchat-blog-v2
duli/capability
duli/cuda_op_builder
duli/op_builder
duli/pre_post
duli/zero_debugging
elastic-ckpt-refresh
elasticity-v2
eltonz/copy_grad_stream
enable-neox
encoded-ds-config
fairseq-moe
fairseq-moe-debug
falcon-180b
fastgen-blog
fastgen-blog-2
features/rebase-quant-fp6
fix_mpu_ckpt
fix-MoQ
fix-autotuning-docs
fix-autotuning-exit
fix-autotuning-reqs
fix-flops-profiler
fix-fp16-test
fix-injection
fix-max_train_batch_size
fix-misaligned-grad
fix-moe-top1gating
fix-sp-dense
fix-sparse-attn
fix-tuner-prescale_gradients
fix-tuner-scheduler-bug
fix-twitter
fix-typos
flash-attention
flops-profiler-skip-unused-args
fp6-blog
fs/soft-kernel
fs-82
fs-soft-kernel
fs-z2-fix
gcooper/make_optimizer_optional
generic-ckpt-loading
gh-pages
gh-readonly-queue/master/pr-3852-3491e32d72746ec3d990108a23e67b2666b3e0e0
gh-readonly-queue/master/pr-3852-adb9bc14b780115fd54f3f1234abcb7ab52fa975
gh-readonly-queue/master/pr-3854-85503dab878875175b6d5eb6a39125878c172273
gh-readonly-queue/master/pr-3892-9f8817b2425bb82d9b6355caa6d2d0ebd036885d
gh-readonly-queue/master/pr-3892-548451ba4e8ea71029d738c33f639e0439aad1dd
gh-readonly-queue/master/pr-3893-cc71eec8c85c4437d8139e53372da7f22224fed5
gh-readonly-queue/master/pr-3928-82115d9059ce8271229c8f63153a02f2d323cfc1
gh-readonly-queue/master/pr-4163-5e16eb2c939707d0d0062a458d77998fccb3afad
gma/rollback_6726
gma/xpu_compile_analysis
good-moe
gpt2-debug
guanhua/adam-timer
guanhua/adam-timer2
guanhua/check-bf16
guanhua/fix-cutlass-ver
guanhua/h2d-offload
guanhua/kernel-test
guanhua/mics-fix
guanhua/overflow-check
guanhua/quant-dequant-test
guanhua/quant-test
guanhua/rocm-cpu-adam
guanhua/v14.0-bf16-check
hf-workaround
hp-sam
hpzero-preview
inference/ElutherAI-GPTJ
inference/TP-general-support
inference/add-bf16-support
inference/engine-api
inference/fix-masking
inference/fix-mp-init
inference/support-encoder-decoder
inference-api/tutorial
inference-read-checkpoint
inference-refactor-v1-mro-test
injection-fixes
jeff-test
jeffra/auto-bucket
jeffra/available_memory
jeffra/bf16-updates
jeffra/bf16-updates-v2
jeffra/ci-updates
jeffra/ckpt-barrier
jeffra/docker-update
jeffra/engine-xthru
jeffra/engine-xthru-v2-no-padding
jeffra/engine-xthru-v2
jeffra/external-skip
jeffra/fix-1416
jeffra/fs-diverge
jeffra/fs-gas-fix
jeffra/fs-gas-fix-v2
jeffra/fs-support
jeffra/fs-z3-v0510
jeffra/fs-z3
jeffra/gptj-fixes
jeffra/inf-engine-refactor
jeffra/inf-tests
jeffra/jit-fix
jeffra/latest-hf
jeffra/op-build-api
jeffra/prepost_fwd_and_generate
jeffra/saksham-zero1-fixes
jeffra/savepid2
jeffra/shm-report
jeffra/staging-comms-logging-v1
jeffra/turn-on-opt-test
jeffra/update-z3-check
jeffra/z1-refresh
jeffra/z1-refresh-2
jeffra/z1-refresh-3
jeffra/z3-fix
jeffra/z3-new-param
jeffra/zero1-grad-norm
jeffra/zero-1-fix
jeffra/zero-1-fix-test
jeffra/zero-ckpt-fixes
jeffra/zero-moe-noCG
jeffra/1node-launcher-fix
jeffra/2904
jeffra-patch-2
jerasley/mac
jomayeri/aio-locked-tensor
jomayeri/aio-mem-fix
jomayeri/aio-op-parallel
jomayeri/aio-type-mismatch
jomayeri/bf16-zero-check
jomayeri/bug-5880
jomayeri/debug-2361
jomayeri/deepnvme-perf-debug
jomayeri/destroy-zero
jomayeri/fp8-init
jomayeri/gds-swapper-fix
jomayeri/h100-unittest
jomayeri/he-mp-assert
jomayeri/issue-3367
jomayeri/issue-3560
jomayeri/issue-3598
jomayeri/issue-3769
jomayeri/issue-4083
jomayeri/issue-4095
jomayeri/issue-4183
jomayeri/issue-5087
jomayeri/lr-step-init
jomayeri/lr-step-move
jomayeri/model-param-list
jomayeri/new-zero-accum
jomayeri/swap-with-locked
jomayeri/zero3-hooks
jomayeri/zero-grad-accum
kv-cache-reset
landing-training
landing-updates
lekurile/add_ds_chat_workflow
lekurile/add_hip_abstraction
lekurile/clean_up_params
lekurile/container_param_cleanup
lekurile/debug_bloom
lekurile/ds_chat_attn_mlp_base
lekurile/ds_chat_fix_test
lekurile/ds_chat_gh_wf
lekurile/ds_chat_mlp_debug
lekurile/ds_chat_opt_fix
lekurile/ds_chat_revert_54c06872
lekurile/ds_chat_test_exit_first
lekurile/ds_chat_test_f69f8840
lekurile/ds_chat_test_7b5b0660
lekurile/ds_chat_test_54c06872
lekurile/fix_ds_chat_bloom
lekurile/fix_formatting
lekurile/fix_he_print
lekurile/fix_issue_2330
lekurile/fix_opt_meta_tensor
lekurile/fix_phi_2
lekurile/fix_sd_ci
lekurile/fix_sd
lekurile/fix_unet_vae
lekurile/general_local_cg
lekurile/infv2_lm_eval
lekurile/kernel_hip_amd
lekurile/load_ckpt_inf_eng
lekurile/mlp_functions
lekurile/offload_fix_test
lekurile/sd_min_ver
lekurile/test_rearrange_ops
lekurile/update_ds_chat_ci_test
lekurile/update_ds_chat_ci
lekurile/update_ds_chat_ci_2
lekurile/update_dschat_wf
lekurile/update_inf_ckpt_load
lf-test
loadams/a6000-fix
loadams/a6000-fix-0-15-2
loadams/accelerator-test
loadams/adam-params
loadams/add-contributing-release-md-files
loadams/add-gaudi-badge-readme
loadams/add-scheduled-open-issue-check-ds-chat
loadams/add-torch-2-support
loadams/amd-57
loadams/amd-mi200-tests
loadams/amd-pre-compile
loadams/amd-updates
loadams/auto-stage3-prefetch-bucket-size
loadams/auto-task-open-failure
loadams/azure-blob-storage
loadams/build-for-cpu
loadams/changes-to-op-builder
loadams/check-accelerate
loadams/check-ds-chat-transformers-debug
loadams/check-pydantic-v2-support
loadams/cleanup
loadams/clear-cache
loadams/cpu-inf
loadams/cpu-inf-triggers
loadams/cpu-inf-v0-docker
loadams/cpu-inference-shorten
loadams/cpu-runner-debug
loadams/cpu-torch
loadams/cpu-torch-latest-fix-debug
loadams/cu118
loadams/cuda-compilation-nv-bfloat162
loadams/dc-test
loadams/debug-opbuilder
loadams/debug-torch
loadams/disable-h100-ci
loadams/disable-windows-ops-build-script
loadams/dot-deepspeed_env-test
loadams/dpkg-libaio
loadams/ds-chat-fixes-test
loadams/empty-env-var-setup
loadams/enable-amdmi200
loadams/enable-python
loadams/enable-workflow-dispatch-nv-torch-nightly-v100
loadams/engine-pos-args
loadams/fix-a6000-debug
loadams/fix-a6000-transformers
loadams/fix-check-valid-version
loadams/fix-cpu-inf-test-time
loadams/fix-cuda-build-ops
loadams/fix-docs-rendering
loadams/fix-ds-chat
loadams/fix-fp16-bf16-logging-issue
loadams/fix-hpu
loadams/fix-lightning-pytorch2
loadams/fix-mpi4py
loadams/fix-nccl-comm-torch-check
loadams/fix-no-torch-failure-mlu
loadams/fix-nv-inference
loadams/fix-nv-inference-hang
loadams/fix-nv-torch-latest-v100
loadams/fix-onebit-skip
loadams/fix-torch-2
loadams/fix-torch-compiler-hasattr
loadams/fix-torch-linalg-norm
loadams/fix-triggers-no-torch-workflow
loadams/flops-profiler-scaled-dor-attn-torch-2
loadams/get-amd-team-ci
loadams/get-logs-ci-failure
loadams/gh-cpu-inf
loadams/gh-release-version-update
loadams/hf-transformers-ci-fix
loadams/hpu-uts
loadams/ignore-unused-params-default
loadams/inference-ops-test-repro
loadams/inference-transformers-enable
loadams/lamb-bf16
loadams/libaio
loadams/low-cpu-mem-ut
loadams/lsb-release
loadams/megatron
loadams/megatron-lm-112
loadams/megatron-new-pypi
loadams/megatron-version
loadams/mii-transformers-debug
loadams/more-torch-2-support
loadams/nv-inf-jobs-test
loadams/nv-inf-test
loadams/nv-inference-revert
loadams/nv-nightly
loadams/nv-nightly-fix-transformers
loadams/nv-sd-badge
loadams/opbuildertest
loadams/openmpi-eth0
loadams/pin-torch-latest-ver
loadams/pip-ver
loadams/pre-compile-test
loadams/py36
loadams/pynvml
loadams/pyproject
loadams/pyproject-toml
loadams/pyproject-toml-tests
loadams/recurse-flops-profiler
loadams/reenable-cpu-inference
loadams/reenable-py311-312
loadams/remove-dead-code
loadams/remove-modeling
loadams/remove-python-36-check
loadams/rename-fp-quantize-cu
loadams/rename-nv-torch-latest-cpu-workflow
loadams/revert-4660
loadams/revert-5608
loadams/revert-cpu-inf
loadams/revert-loss
loadams/revert-nv-inference-changes
loadams/revert-pr-5608
loadams/revert-userwarning
loadams/rocm6
loadams/rocm57
loadams/rocm-fixes
loadams/sd-fixes
loadams/sd-paths
loadams/sequential-2
loadams/setup-h100-triggers
loadams/shuffle-data-sampler
loadams/shuffle-true
loadams/shuffle-true-dataloader
loadams/sigterm
loadams/skip-nv-inference
loadams/sparse-attn-fix
loadams/sparse-attn-torch-2
loadams/stablediffusion-test-triton2
loadams/switch-modeling-compression
loadams/tar-vuln
loadams/test-0.15.0
loadams/test-amp-futurewarning
loadams/test-b421e8c8f31af254b63ad6e9839f617ab6d9c060
loadams/test-ccl-fixes
loadams/test-compile
loadams/test-cpu
loadams/test-cpu-inf-fix
loadams/test-f0e3f01d7c7a3d8748212e61eaf487fab41168a7
loadams/test-fix-nv-inference
loadams/test-glibc228
loadams/test-hpu-update-192
loadams/test-merged-changes
loadams/test-model-task
loadams/test-new-numpy
loadams/test-nv-ds-chat-failure-mode
loadams/test-nv-latest-cpu
loadams/test-nv-torch-latest-v100
loadams/test-pydantic-update
loadams/test-pytest-ordering
loadams/test-runsc
loadams/test-toml
loadams/test-toml-2
loadams/test-torch-2.3.0
loadams/test-torch-2.7
loadams/test-transformers-inference
loadams/test-xpu-builds
loadams/torch-cpu-mismatch-cudaopbuilder
loadams/torch-linalg-vectornorm
loadams/torch-nightly-debug
loadams/transformers-ds-chat-debug
loadams/transformers-fixes
loadams/transformers-latest
loadams/transformers-torch
loadams/transformers-torch-update
loadams/transformers-workflow-dispatch
loadams/triton-22-update
loadams/try-bump-pydantic
loadams/unpin-hf-transformers-nv-workflows
loadams/unpin-nv-torch-latest
loadams/unpin-transformers
loadams/unpin-transformers-hpu
loadams/unpin-transformers-latest
loadams/unpin-transformers-latest-a6000
loadams/update-2004-checkout-actions
loadams/update-a6000-workflows
loadams/update-accelerate
loadams/update-amd-required-paths
loadams/update-classifiers
loadams/update-conda-pydantic
loadams/update-container-a6000
loadams/update-container-pre-compile
loadams/update-docker
loadams/update-docker-nv-sd
loadams/update-dockerfile
loadams/update-flake8
loadams/update-governance
loadams/update-hostname-I
loadams/update-hpu-1-18
loadams/update-hpu-docker-container
loadams/update-hpu-docker-image
loadams/update-hpu-gaudi-flow-more
loadams/update-just-nv-a6000-container
loadams/update-mii-transformers
loadams/update-nodejs-reate-pr-action
loadams/update-nv-accelerate
loadams/update-nv-inference-torch-ver
loadams/update-nv-lightning-test-cu-ver
loadams/update-nv-torch-latest-cpu-torch-ver
loadams/update-nv-torch-latest-cpu-version
loadams/update-pre-compile-ops-docker
loadams/update-pydantic
loadams/update-pyproject-toml
loadams/update-pytest
loadams/update-pytest-error-codes
loadams/update-real-latest
loadams/update-sd-triton
loadams/update-torch-27
loadams/update-transformers
loadams/update-transformers-cu116
loadams/update-version-txt-post-release
loadams/update-website-sidebar
loadams/update-whl-build-commands
loadams/x86-accelerator
loadams/xpu-readme
loadams/xpu-test
loadams/xpu-yml
lokoppak/ln_schedule_update
lokoppak/low_cpu_mem_usage_ut
lokoppak/new_pt_binding
lokoppak/quantization_3d
lokoppak/ref_ln
lsh
master
master-test
megatron2.4-3d
minjiaz/ds-seq-tutorial
minjiaz/moe-comm
minjiaz/moe-sharing
moe-full-tp
moe-inference/add-tutorial
moe-inference-tutorial
moe-inference-tutorial1
moe-pipelining
moe-timing
mosm/autotp_llama
mosm/autotp-he
mosm/bloom_dev
mosm/codegen
mosm/debug-ds-attn
mosm/debugger
mosm/dschat-news
mosm/inf-refactor
mosm/llama2
mosm/matmul_test
mosm/module_parser
mosm/mp_tutorial
mosm/opt-kernel
mosm/softmax
mosm/softmax-longseq
mosm/t5
mosm/test
mosm/tp_dev
mosm/wb-param
mrwyattii/expand-fp16-tests
mrwyattii/fix-accelerate-tests
mrwyattii/fix-for-mii-UT
mrwyattii/fix-inference-skipped-tests
mrwyattii/fix-launcher-user-args
mrwyattii/fix-multi-node-checks
mrwyattii/pin-datasets
mrwyattii/pydantic-2-support
mrwyattii/remove-symlinks
mrwyattii/rename-cpu-accelerator
mrwyattii/safetensor
mrwyattii/silence-backend-warning
mrwyattii/update-GH-permission
mrwyattii/update-MII-tests-infV2
multi-z3-prs
multi-z3-prs-r2
mwyatt/fp-quant-debug
mz/llama-support
neox-q-int8
niumanar/gan_optimizer
offloadpp-news
olruwase/accelerator_abstraction
olruwase/adam_types
olruwase/align_rrg_rs_param_order
olruwase/all_gather_profiling
olruwase/amd_configurable_pp_rtol
olruwase/assert_unused_parameters
olruwase/b16-debugging
olruwase/bf16-updates-2
olruwase/bf16_tied_weights_reduce
olruwase/bf16_update_hp_params
olruwase/bloom_176b_checkpoint_bc
olruwase/bloom-support
olruwase/build_compat_ops
olruwase/ci_pytorch_1x
olruwase/deepnvme_abstract_class
olruwase/deepnvme_docs
olruwase/disable_prefetch_profiler
olruwase/disable_z3_prefetcher
olruwase/dnvme_docs
olruwase/ds_2449
olruwase/ds_2921
olruwase/ds_3481
olruwase/ds_3680_2
olruwase/ds_3948
olruwase/ds_4998
olruwase/ds_5241
olruwase/ds_7150
olruwase/dynamic_graph_activation_checkpoint
olruwase/elastic-ckpt-refresh
olruwase/engine_destroy
olruwase/fast_persist
olruwase/fix_kernel_memory_bloat
olruwase/frozen_weights_unit_test
olruwase/fs_z3_trace_error_disable
olruwase/fs_z3_trace_log
olruwase/fs-zero3_trace_fix
olruwase/fuse_torch_adam_w
olruwase/gpt3-finetuning
olruwase/grad_accum_loss
olruwase/issue_3062
olruwase/llama2_empty_group
olruwase/local_storage_checkpoint
olruwase/lr_warmup_decay
olruwase/non_tensor_activation_checkpoint
olruwase/nvme_finetune
olruwase/nvme_offload_bug
olruwase/nvme_perf_sweep
olruwase/nvme_testsuite
olruwase/override_module_apply
olruwase/pr_6772
olruwase/refactor_universal_checkpoint
olruwase/restore_from_bit16_weights
olruwase/round_robin_gradient_option
olruwase/safe_pkg_check
olruwase/safe_py_subprocess
olruwase/save_checkpoint_latest_false
olruwase/save_zero3_fp16_weights
olruwase/set_zero_opt_grad
olruwase/setup_env_libaio
olruwase/trainable_parameters
olruwase/update_nvme_offload_states
olruwase/windows_blog
olruwase/z3_perf_tune
olruwase/z3_suppress_warning
olruwase/zcode_model_expert
olruwase/zero_inference_tokgen
olruwase/zero_inference_torch_version
olruwase/zero_offload_e2e
olruwase/zero_offload_fix_corner_case
olruwase/zero_offload_v3
olruwase/zero_optional_reduce_scatter
olruwase/zero_stage1_checkpoint_layout
olruwase/zero_stage1_elastic_checkpoint
olruwase/zero1_non_tensor_checkpoint
olruwase/zero2_grad_accum_bug
olruwase/zero2_offload_keyerror
olruwase/zero2_offload_rrb_divergence
olruwase/zero2_offload_slowdown
olruwase/zero2_trainable_parameters_v0.5.7
olruwase/zero2_trainable_parameters
olruwase/zero2_unbalanced_grad_reduction
olruwase/zero3_amp_autocast
olruwase/zero3_broken_tracing
olruwase/zero3_dp_norm_allreduce
olruwase/zero3_profile_fetch
olruwase/zero3_unboundlocal_bug
olruwase/zinf_none_swapper
paper
patch-z1-cont-grad
pr_moe_tutorial
preserve-CVDs
profiler-add-shape
qanthony/bigbird
qanthony/comms-bench
qanthony/nccl-backend
quantization-refresh
quantize-inference
refine-quantizer
remotes/origin/dev/tput
remove-tbx
remove-unused-quantize-settings
reyazda/adam-scalar-fix
reyazda/cpu_adam_jit_v2
reyazda/fix-inference-api
reyazda/pytorch-workspace-allocate
reyazda/remove_bertid
reyazda/support_AVX2_by_default
reyazda/test-hidden-dimension
reyazda/test-sparse
reyazda/test-sparse-v2
reyazda/test-transformer
reyazda/testing_embedding
reyazda/triton-new-sparse
reza/deepspeed_adam_merge_v3
reza/fix_adam_corner_case
reza/fix_adam_perf
reza/fix-adam-copyfp16
reza/megatron_kernel_integration
rtd-staging
saforem2/fix-missing-packages
saforem2/ucp-bug
saksham-zero1-fixes
samyam-overlap-comm
samyamr/elasticity
samyamr/fix-for-fragmented-linear-inputs
samyamr/gpt3-finetuning
samyamr/gpt3-finetuning-mixed-precision
samyamr/stage3-alignment-fix
samyamr/zero-2-debug
security-patch
shaden/textgen
smartreply_hotfix
sp/comm-opt
sp-mpu
sparse-attn/support-latest-triton
sparse-attn-cuda11
staging-amd
staging-amd-port
staging-amd-v2
staging-amd-v3
staging-comms-next-v2
staging-comms-v1
staging-deepnvme-gds-v1
staging-demo-feature-v0
staging-ds-chat-blog-v1
staging-ds-seq-v1
staging-inference-v2-5
staging-mii-update
staging-moe-next-v1
staging-oaas
staging-pld-v1
staging-pp
staging-test
staging-zero-dual-v2
staging-zero-dual-v3
staging-zero-dual-v5
staging-zero-inference-v1
stale-issues
stas00-dist-init-device-id
stas00-makefile
stas00-patch-2
stas00-patch-3
styoun/triton2.1-autotune
styoun/triton2.1
styoun/triton-flash2
styoun/zero-inf-8bit-q
subprocess-test
test-ac
test-cuda-11.7
tmp
tmp-old
tohtana/add_slides_meetup_japan
tohtana/allocate_test_port
tohtana/autocast_only_floating_values
tohtana/bcast_input
tohtana/bcast_warning_z3
tohtana/blog_win_jp
tohtana/cache_kv_requirements
tohtana/clean_after_test
tohtana/clean_all_param_coordinators
tohtana/clean_up_prefetch_param
tohtana/compile_no_grad
tohtana/compile-zero
tohtana/consistent_zero_grad
tohtana/dc_fix_symint_input
tohtana/dc_offload_debug
tohtana/debug_compile_backends
tohtana/debug_semaphore_leak
tohtana/deepcompile_fix_scheduling
tohtana/deepcompile_fix_selective_gather
tohtana/file_store_for_tests
tohtana/fix_bf16_opt_update_hp
tohtana/fix_chkpt_alignment
tohtana/fix_sort_dp_univ_ckpt
tohtana/fix_univ_chkpt_load
tohtana/fix_zero_init_patch
tohtana/fix-save-checkpoint-step
tohtana/get_offload_state_api
tohtana/lock_hf_cache_update
tohtana/log_run_tests
tohtana/merge_FPDT
tohtana/model_declaration_in_init_context
tohtana/offload_zero_buffers
tohtana/pipeline_with_compiled_module
tohtana/remove_step_on_init
tohtana/simplify_param_coordinator
tohtana/support_autocast
tohtana/test_with_pt25
tohtana/univ_ckpt_custom_shape
tohtana/z3_multi_dtypes
tohtana/z3_no_mixed_precision
token-drop
transformer/fix-layer-norm
transformer/injection
transformer/large-seq-support
transformer/triangular-mask
transformer-injection
transformer-kernel/support-arbitrary-hidden
triton-fix
ucp_blog
ulysses
ulysses-offload-tutorial
ulyssess-offload-blog
umchand/test_compiler
umchand/triton/bias_act
unify-benchmark-knowledge
update-flops-profiler-doc
update-flops-profiler-pool-compute
workaround-zero3
z1-offload-multigpu
z3-mem-leak
zero-ckpt-cpu-issue-v2
zhenyzhang-data
zheweiyao/quantize_update
silence warning
mrwyattii
committed
1 year ago
d32b362f
Option to exclude frozen weights for checkpoint save (#3953)
tjruwase
committed
1 year ago
Verified
0a0819b7
Make AMD/ROCm apex install to /blob to save test/compile time. (#3997)
loadams
committed
1 year ago
Verified
ceccfa3e
Re-enable skipped unit tests (#3939)
mrwyattii
committed
1 year ago
Verified
7b850d3d
[CPU] Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node) (#3919)
delock
committed
1 year ago
Verified
1bc3b784
ZeRO Gradient Accumulation Dtype. (#2847)
jomayeri
committed
1 year ago
Verified
8afcda2a
[CPU] Skip CPU support unimplemented error (#3633)
Yejing-Lai
committed
1 year ago
Verified
7290aace
pre-commit hook (#3994)
wangruohui
committed
1 year ago
Verified
c79a104c
fixing flops profiler formatting, units and precision (#3927)
Alexander Jipa
committed
1 year ago
Verified
488a1b98
Fix checkpoint conversion when model layers share weights (#3825)
awaelchli
committed
1 year ago
Verified
fb9aebbf
support HBM in utils/numa.py (#3918)
delock
committed
1 year ago
Verified
5dadf687
Simplify chain comparisons, remove redundant parentheses (#3912)
digger-yu
committed
1 year ago
Verified
fc8de76f
Switch to torch.linalg.norm (#3984)
loadams
committed
1 year ago
Verified
a655d7d3
fix duplicated unit test issue (#3951)
mrwyattii
committed
1 year ago
Verified
04b1f58e
different port ranges for xdist workers (#3975)
mrwyattii
committed
1 year ago
Verified
a1effc91
add zero++ paper link (#3974)
jeffra
committed
1 year ago
Verified
cbf2f61a
bump to 0.10.1
jeffra
committed
1 year ago
5b2dc7a8
fix(cpu_accelerator): :bug: Convert LOCAL_SIZE to integer (#3971)
javsalgar
committed
1 year ago
Verified
f5c834a6
Create accelerator for apple silicon GPU Acceleration (#3907)
NripeshN
committed
1 year ago
Verified
31ac29dd
do bcast only pp_group_size>1 (#3915)
inkcherry
committed
1 year ago
Verified
05a6cee1
Use device_name instead of device index to support other device (#3933)
hipudding
committed
1 year ago
Verified
7528035c
fix Megatron-DeepSpeed links (#3956)
conglongli
committed
1 year ago
Verified
4d965416
Fix docs for checkpoints (#3955)
loadams
committed
1 year ago
Verified
ed34ddca
fix "ERROR: failed to solve: nvidia/cuda:11.7.0-devel-ubuntu18.04: docker.io/nvidia/cuda:11.7.0-devel-ubuntu18.04: not found" (#3930)
KaiChen1008
committed
1 year ago
Verified
45cecc05
add xTrimoPGLM (#3940)
jeffra
committed
1 year ago
Verified
aa54dba0
Update zero_to_fp32.py (#3936)
PicoCreator
committed
1 year ago
Verified
103884ae
Reduce Unit Test Times (Part 3) (#3850)
mrwyattii
committed
1 year ago
Verified
aef6c65c
remove the call to param.ds_tensor from print (#3928)
HeyangQin
committed
1 year ago
Verified
e59f69a8
Del comment deepspeed.zero.Init() can be used as a decorator (#3894)
hipudding
committed
1 year ago
Verified
e292343d
fix: change ==NONE to is (#3923)
digger-yu
committed
1 year ago
Verified
ce535945
Older