Go
Home
Pricing
FAQ
Install
Home
Pricing
FAQ
Install
Login
via GitHub
microsoft/DeepSpeed
Pull Requests
Commits
loadams/fix-torch-compiler-hasattr
AutoPR/0.12.2
AutoPR/0.14.0
AutoPR/0.14.5
CUDA-Graph-support
HeyangQin/deepspeed-ulysses-chinese-blog
HeyangQin/enable_hpz_nograd
HeyangQin/fastgen_moe_h100
HeyangQin/fix_hpz_nograd
HeyangQin/fix_issue_3062
HeyangQin/fix_issue_3068
HeyangQin/fix_issue_3156
HeyangQin/fix_issue_5205
HeyangQin/fix_pr_3462_standalone
HeyangQin/hpz_convergence
HeyangQin/inference_t5_phase1
HeyangQin/mixed_precision_lora_sam
HeyangQin/mixz_tutorial
HeyangQin/skip_bias_quant
HeyangQin/staging-zero-pp-v1
HeyangQin/ucp_blog_chinese
HeyangQin/ulysses_fp8
Megtron-Kernel-Integration
SA_feature_tag
SA_tutorial_update
SA_update_tutorial_link
add-bfp16-support
add-comm-layout
add-inference-comm
add-llama2-support
add-quantizer
add-shared-lib
adk9/phi3-inference
adk9/phi3-small
adk9/update-minor-cuda
amawa/add-moe-container
amawa/aml-get-hosts
amawa/auto-save-ckpt
amawa/config-pass-down
amawa/debug
amawa/fix-amd-rocm
amawa/fix-auto-tp-load-ckpt
amawa/fix-tracer-zero3
amawa/fix-z3-for-hf-accelerate
amawa/fix-z3-warn-print-v2
amawa/inference-fix
amawa/remove-deepcopy
amawa/split-a2a
amawa/zero-inf-refactor
amawa/1-bit-alltoall
amawa/1bit-adam-nccl
amd-jiting
aml-autotuner
arashb/fix-phi-2
arashb-patch-1
arpan/auto-check
autocast-fix
awan-10-patch-1
awan-10-patch-2
awan-10-patch-3
azure
big-science
big-science-v2
bing/debugging
bing/ds-adam
bing/formatting-correction
bing/io-tutorial
bing/modify-ds-optimizer
bing/optimizer-naming
bloom-debug
chatgpt-chinese-blog
check-linear-sizes
cholmes/activation-utils
cholmes/checkpoints-inference-v2-2
cholmes/comm-group-cache
cholmes/fix_reduction_utils_amd
cholmes/fix-asym-quant
cholmes/isolate-src-code
cholmes/kv-cache-flexibility
cholmes/mem-access-predicated-load
cholmes/migrate-to-dequant-lib
cholmes/pipelined-quant
cholmes/reduce-quantized-gpus
cholmes/sd-extension
cholmes/ts-builder
cholmes/unique-cuda-graphs
ckpt-fix-unfused
clean-llama
clean-llama-v2
clean-opt
clean-opt-base
clean-opt-v2-base
clean-opt-v2
codegen-inference
comm-opt2
costineseanu/windows_inference_build
cpu-adam/optional_CUDA-copy
debug-base-attn
debug-ds-inf
debug-ds-inf-torch-matmul
ds-chat-blog-8-31
ds-chat-clean-opt
ds-chat-news
ds-chat-release
ds-inference/add-falcon-support
ds-inference/bloom-support-meta
ds-inference/fix-generation
ds-inference/fix-mp
ds-inference/remove-randgen
ds-inference/simplify
ds-inference/support-large-token-length
ds-seq-tutorial
ds-vchat-blog-v1
ds-vchat-blog-v2
duli/capability
duli/cuda_op_builder
duli/op_builder
duli/pre_post
duli/zero_debugging
elastic-ckpt-refresh
elasticity-v2
eltonz/copy_grad_stream
enable-neox
encoded-ds-config
fairseq-moe
fairseq-moe-debug
falcon-180b
fastgen-blog
fastgen-blog-2
features/rebase-quant-fp6
fix_mpu_ckpt
fix-MoQ
fix-autotuning-docs
fix-autotuning-exit
fix-autotuning-reqs
fix-flops-profiler
fix-fp16-test
fix-injection
fix-max_train_batch_size
fix-misaligned-grad
fix-moe-top1gating
fix-sp-dense
fix-sparse-attn
fix-tuner-prescale_gradients
fix-tuner-scheduler-bug
fix-twitter
fix-typos
flash-attention
flops-profiler-skip-unused-args
fp6-blog
fs/soft-kernel
fs-82
fs-soft-kernel
fs-z2-fix
gcooper/make_optimizer_optional
generic-ckpt-loading
gh-pages
gh-readonly-queue/master/pr-3852-3491e32d72746ec3d990108a23e67b2666b3e0e0
gh-readonly-queue/master/pr-3852-adb9bc14b780115fd54f3f1234abcb7ab52fa975
gh-readonly-queue/master/pr-3854-85503dab878875175b6d5eb6a39125878c172273
gh-readonly-queue/master/pr-3892-9f8817b2425bb82d9b6355caa6d2d0ebd036885d
gh-readonly-queue/master/pr-3892-548451ba4e8ea71029d738c33f639e0439aad1dd
gh-readonly-queue/master/pr-3893-cc71eec8c85c4437d8139e53372da7f22224fed5
gh-readonly-queue/master/pr-3928-82115d9059ce8271229c8f63153a02f2d323cfc1
gh-readonly-queue/master/pr-4163-5e16eb2c939707d0d0062a458d77998fccb3afad
gma/investigate_7334
gma/xpu_compile_analysis
good-moe
gpt2-debug
guanhua/adam-timer
guanhua/adam-timer2
guanhua/check-bf16
guanhua/fix-cutlass-ver
guanhua/h2d-offload
guanhua/kernel-test
guanhua/mics-fix
guanhua/overflow-check
guanhua/quant-dequant-test
guanhua/quant-test
guanhua/rocm-cpu-adam
guanhua/v14.0-bf16-check
hf-workaround
hp-sam
hpzero-preview
inference/ElutherAI-GPTJ
inference/TP-general-support
inference/add-bf16-support
inference/engine-api
inference/fix-masking
inference/fix-mp-init
inference/support-encoder-decoder
inference-api/tutorial
inference-read-checkpoint
inference-refactor-v1-mro-test
injection-fixes
jeff-test
jeffra/auto-bucket
jeffra/available_memory
jeffra/bf16-updates
jeffra/bf16-updates-v2
jeffra/ci-updates
jeffra/ckpt-barrier
jeffra/docker-update
jeffra/engine-xthru
jeffra/engine-xthru-v2-no-padding
jeffra/engine-xthru-v2
jeffra/external-skip
jeffra/fix-1416
jeffra/fs-diverge
jeffra/fs-gas-fix
jeffra/fs-gas-fix-v2
jeffra/fs-support
jeffra/fs-z3-v0510
jeffra/fs-z3
jeffra/gptj-fixes
jeffra/inf-engine-refactor
jeffra/inf-tests
jeffra/jit-fix
jeffra/latest-hf
jeffra/op-build-api
jeffra/prepost_fwd_and_generate
jeffra/saksham-zero1-fixes
jeffra/savepid2
jeffra/shm-report
jeffra/staging-comms-logging-v1
jeffra/turn-on-opt-test
jeffra/update-z3-check
jeffra/z1-refresh
jeffra/z1-refresh-2
jeffra/z1-refresh-3
jeffra/z3-fix
jeffra/z3-new-param
jeffra/zero1-grad-norm
jeffra/zero-1-fix
jeffra/zero-1-fix-test
jeffra/zero-ckpt-fixes
jeffra/zero-moe-noCG
jeffra/1node-launcher-fix
jeffra/2904
jeffra-patch-2
jerasley/mac
jomayeri/aio-locked-tensor
jomayeri/aio-mem-fix
jomayeri/aio-op-parallel
jomayeri/aio-type-mismatch
jomayeri/bf16-zero-check
jomayeri/bug-5880
jomayeri/debug-2361
jomayeri/deepnvme-perf-debug
jomayeri/destroy-zero
jomayeri/fp8-init
jomayeri/gds-swapper-fix
jomayeri/h100-unittest
jomayeri/he-mp-assert
jomayeri/issue-3367
jomayeri/issue-3560
jomayeri/issue-3598
jomayeri/issue-3769
jomayeri/issue-4083
jomayeri/issue-4095
jomayeri/issue-4183
jomayeri/issue-5087
jomayeri/lr-step-init
jomayeri/lr-step-move
jomayeri/model-param-list
jomayeri/new-zero-accum
jomayeri/swap-with-locked
jomayeri/zero3-hooks
jomayeri/zero-grad-accum
kv-cache-reset
landing-training
landing-updates
lekurile/add_ds_chat_workflow
lekurile/add_hip_abstraction
lekurile/clean_up_params
lekurile/container_param_cleanup
lekurile/debug_bloom
lekurile/ds_chat_attn_mlp_base
lekurile/ds_chat_fix_test
lekurile/ds_chat_gh_wf
lekurile/ds_chat_mlp_debug
lekurile/ds_chat_opt_fix
lekurile/ds_chat_revert_54c06872
lekurile/ds_chat_test_exit_first
lekurile/ds_chat_test_f69f8840
lekurile/ds_chat_test_7b5b0660
lekurile/ds_chat_test_54c06872
lekurile/fix_ds_chat_bloom
lekurile/fix_formatting
lekurile/fix_he_print
lekurile/fix_issue_2330
lekurile/fix_opt_meta_tensor
lekurile/fix_phi_2
lekurile/fix_sd_ci
lekurile/fix_sd
lekurile/fix_unet_vae
lekurile/general_local_cg
lekurile/infv2_lm_eval
lekurile/kernel_hip_amd
lekurile/load_ckpt_inf_eng
lekurile/mlp_functions
lekurile/offload_fix_test
lekurile/sd_min_ver
lekurile/test_rearrange_ops
lekurile/update_ds_chat_ci_test
lekurile/update_ds_chat_ci
lekurile/update_ds_chat_ci_2
lekurile/update_dschat_wf
lekurile/update_inf_ckpt_load
lf-test
loadams/a6000-fix
loadams/a6000-fix-0-15-2
loadams/accelerator-test
loadams/adam-params
loadams/add-contributing-release-md-files
loadams/add-gaudi-badge-readme
loadams/add-scheduled-open-issue-check-ds-chat
loadams/add-torch-2-support
loadams/amd-57
loadams/amd-mi200-tests
loadams/amd-pre-compile
loadams/amd-updates
loadams/auto-stage3-prefetch-bucket-size
loadams/auto-task-open-failure
loadams/azure-blob-storage
loadams/build-for-cpu
loadams/changes-to-op-builder
loadams/check-accelerate
loadams/check-ds-chat-transformers-debug
loadams/check-pydantic-v2-support
loadams/cleanup
loadams/clear-cache
loadams/cpu-inf
loadams/cpu-inf-triggers
loadams/cpu-inf-v0-docker
loadams/cpu-inference-shorten
loadams/cpu-runner-debug
loadams/cpu-torch
loadams/cpu-torch-latest-fix-debug
loadams/cu118
loadams/cuda-compilation-nv-bfloat162
loadams/dc-test
loadams/debug-opbuilder
loadams/debug-torch
loadams/disable-h100-ci
loadams/disable-libaio
loadams/disable-windows-ops-build-script
loadams/dot-deepspeed_env-test
loadams/dpkg-libaio
loadams/ds-chat-fixes-test
loadams/empty-env-var-setup
loadams/enable-amdmi200
loadams/enable-python
loadams/enable-workflow-dispatch-nv-torch-nightly-v100
loadams/engine-pos-args
loadams/fix-a6000-debug
loadams/fix-a6000-transformers
loadams/fix-check-valid-version
loadams/fix-cpu-inf-test-time
loadams/fix-cuda-build-ops
loadams/fix-docs-rendering
loadams/fix-ds-chat
loadams/fix-fp16-bf16-logging-issue
loadams/fix-hpu
loadams/fix-lightning-pytorch2
loadams/fix-mpi4py
loadams/fix-nccl-comm-torch-check
loadams/fix-no-torch-failure-mlu
loadams/fix-nv-inference
loadams/fix-nv-inference-hang
loadams/fix-nv-torch-latest-v100
loadams/fix-onebit-skip
loadams/fix-torch-2
loadams/fix-torch-compiler-hasattr
loadams/fix-torch-linalg-norm
loadams/fix-triggers-no-torch-workflow
loadams/flops-profiler-scaled-dor-attn-torch-2
loadams/get-amd-team-ci
loadams/get-logs-ci-failure
loadams/gh-cpu-inf
loadams/gh-release-version-update
loadams/hf-transformers-ci-fix
loadams/hpu-uts
loadams/ignore-unused-params-default
loadams/inference-ops-test-repro
loadams/inference-transformers-enable
loadams/lamb-bf16
loadams/libaio
loadams/low-cpu-mem-ut
loadams/lsb-release
loadams/megatron
loadams/megatron-lm-112
loadams/megatron-new-pypi
loadams/megatron-version
loadams/mii-transformers-debug
loadams/more-torch-2-support
loadams/nv-inf-jobs-test
loadams/nv-inf-test
loadams/nv-inference-revert
loadams/nv-nightly
loadams/nv-nightly-fix-transformers
loadams/nv-sd-badge
loadams/opbuildertest
loadams/openmpi-eth0
loadams/pin-torch-latest-ver
loadams/pip-ver
loadams/pre-compile-test
loadams/py36
loadams/pynvml
loadams/pyproject
loadams/pyproject-toml
loadams/pyproject-toml-tests
loadams/recurse-flops-profiler
loadams/reenable-cpu-inference
loadams/reenable-py311-312
loadams/remove-dead-code
loadams/remove-modeling
loadams/remove-python-36-check
loadams/rename-fp-quantize-cu
loadams/rename-nv-torch-latest-cpu-workflow
loadams/revert-4660
loadams/revert-5608
loadams/revert-cpu-inf
loadams/revert-loss
loadams/revert-nv-inference-changes
loadams/revert-pr-5608
loadams/revert-userwarning
loadams/rocm6
loadams/rocm57
loadams/rocm-fixes
loadams/sd-fixes
loadams/sd-paths
loadams/sequential-2
loadams/setup-h100-triggers
loadams/shuffle-data-sampler
loadams/shuffle-true
loadams/shuffle-true-dataloader
loadams/sigterm
loadams/skip-nv-inference
loadams/sparse-attn-fix
loadams/sparse-attn-torch-2
loadams/stablediffusion-test-triton2
loadams/switch-modeling-compression
loadams/tar-vuln
loadams/test-0.15.0
loadams/test-amp-futurewarning
loadams/test-b421e8c8f31af254b63ad6e9839f617ab6d9c060
loadams/test-ccl-fixes
loadams/test-compile
loadams/test-cpu
loadams/test-cpu-inf-fix
loadams/test-f0e3f01d7c7a3d8748212e61eaf487fab41168a7
loadams/test-fix-nv-inference
loadams/test-glibc228
loadams/test-hpu-update-192
loadams/test-merged-changes
loadams/test-model-task
loadams/test-new-numpy
loadams/test-nv-ds-chat-failure-mode
loadams/test-nv-latest-cpu
loadams/test-nv-torch-latest-v100
loadams/test-pydantic-update
loadams/test-pytest-ordering
loadams/test-runsc
loadams/test-toml
loadams/test-toml-2
loadams/test-torch-2.3.0
loadams/test-torch-2.7
loadams/test-transformers-inference
loadams/test-xpu-builds
loadams/torch-cpu-mismatch-cudaopbuilder
loadams/torch-linalg-vectornorm
loadams/torch-nightly-debug
loadams/transformers-ds-chat-debug
loadams/transformers-fixes
loadams/transformers-latest
loadams/transformers-torch
loadams/transformers-torch-update
loadams/transformers-workflow-dispatch
loadams/triton-22-update
loadams/try-bump-pydantic
loadams/unpin-hf-transformers-nv-workflows
loadams/unpin-nv-torch-latest
loadams/unpin-transformers
loadams/unpin-transformers-hpu
loadams/unpin-transformers-latest
loadams/unpin-transformers-latest-a6000
loadams/update-2004-checkout-actions
loadams/update-a6000-workflows
loadams/update-accelerate
loadams/update-amd-required-paths
loadams/update-classifiers
loadams/update-conda-pydantic
loadams/update-container-a6000
loadams/update-container-pre-compile
loadams/update-docker
loadams/update-docker-nv-sd
loadams/update-dockerfile
loadams/update-flake8
loadams/update-governance
loadams/update-hostname-I
loadams/update-hpu-1-18
loadams/update-hpu-docker-container
loadams/update-hpu-docker-image
loadams/update-hpu-gaudi-flow-more
loadams/update-just-nv-a6000-container
loadams/update-mii-transformers
loadams/update-nodejs-reate-pr-action
loadams/update-nv-accelerate
loadams/update-nv-inference-torch-ver
loadams/update-nv-lightning-test-cu-ver
loadams/update-nv-torch-latest-cpu-torch-ver
loadams/update-nv-torch-latest-cpu-version
loadams/update-pre-compile-ops-docker
loadams/update-pydantic
loadams/update-pyproject-toml
loadams/update-pytest
loadams/update-pytest-error-codes
loadams/update-real-latest
loadams/update-sd-triton
loadams/update-torch-27
loadams/update-torch-latest-27
loadams/update-transformers
loadams/update-transformers-cu116
loadams/update-version-txt-post-release
loadams/update-website-sidebar
loadams/update-whl-build-commands
loadams/x86-accelerator
loadams/xpu-readme
loadams/xpu-test
loadams/xpu-yml
lokoppak/ln_schedule_update
lokoppak/low_cpu_mem_usage_ut
lokoppak/new_pt_binding
lokoppak/quantization_3d
lokoppak/ref_ln
lsh
master
master-test
megatron2.4-3d
minjiaz/ds-seq-tutorial
minjiaz/moe-comm
minjiaz/moe-sharing
moe-full-tp
moe-inference/add-tutorial
moe-inference-tutorial
moe-inference-tutorial1
moe-pipelining
moe-timing
mosm/autotp_llama
mosm/autotp-he
mosm/bloom_dev
mosm/codegen
mosm/debug-ds-attn
mosm/debugger
mosm/dschat-news
mosm/inf-refactor
mosm/llama2
mosm/matmul_test
mosm/module_parser
mosm/mp_tutorial
mosm/opt-kernel
mosm/softmax
mosm/softmax-longseq
mosm/t5
mosm/test
mosm/tp_dev
mosm/wb-param
mrwyattii/expand-fp16-tests
mrwyattii/fix-accelerate-tests
mrwyattii/fix-for-mii-UT
mrwyattii/fix-inference-skipped-tests
mrwyattii/fix-launcher-user-args
mrwyattii/fix-multi-node-checks
mrwyattii/pin-datasets
mrwyattii/pydantic-2-support
mrwyattii/remove-symlinks
mrwyattii/rename-cpu-accelerator
mrwyattii/safetensor
mrwyattii/silence-backend-warning
mrwyattii/update-GH-permission
mrwyattii/update-MII-tests-infV2
multi-z3-prs
multi-z3-prs-r2
mwyatt/fp-quant-debug
mz/llama-support
neox-q-int8
niumanar/gan_optimizer
offloadpp-news
olruwase/accelerator_abstraction
olruwase/adam_types
olruwase/align_rrg_rs_param_order
olruwase/all_gather_profiling
olruwase/amd_configurable_pp_rtol
olruwase/assert_unused_parameters
olruwase/b16-debugging
olruwase/bf16-updates-2
olruwase/bf16_tied_weights_reduce
olruwase/bf16_update_hp_params
olruwase/bloom_176b_checkpoint_bc
olruwase/bloom-support
olruwase/build_compat_ops
olruwase/ci_pytorch_1x
olruwase/deepnvme_abstract_class
olruwase/deepnvme_docs
olruwase/disable_prefetch_profiler
olruwase/disable_z3_prefetcher
olruwase/dnvme_docs
olruwase/ds_2449
olruwase/ds_2921
olruwase/ds_3481
olruwase/ds_3680_2
olruwase/ds_3948
olruwase/ds_4998
olruwase/ds_7150
olruwase/dynamic_graph_activation_checkpoint
olruwase/elastic-ckpt-refresh
olruwase/engine_destroy
olruwase/fix_kernel_memory_bloat
olruwase/frozen_weights_unit_test
olruwase/fs_z3_trace_error_disable
olruwase/fs_z3_trace_log
olruwase/fs-zero3_trace_fix
olruwase/fuse_torch_adam_w
olruwase/gpt3-finetuning
olruwase/grad_accum_loss
olruwase/issue_3062
olruwase/llama2_empty_group
olruwase/local_storage_checkpoint
olruwase/lr_warmup_decay
olruwase/non_tensor_activation_checkpoint
olruwase/nvme_finetune
olruwase/nvme_offload_bug
olruwase/nvme_perf_sweep
olruwase/nvme_testsuite
olruwase/override_module_apply
olruwase/pr_6772
olruwase/refactor_universal_checkpoint
olruwase/restore_from_bit16_weights
olruwase/round_robin_gradient_option
olruwase/safe_pkg_check
olruwase/safe_py_subprocess
olruwase/save_checkpoint_latest_false
olruwase/save_zero3_fp16_weights
olruwase/set_zero_opt_grad
olruwase/setup_env_libaio
olruwase/trainable_parameters
olruwase/windows_blog
olruwase/z3_perf_tune
olruwase/z3_suppress_warning
olruwase/zcode_model_expert
olruwase/zero_inference_tokgen
olruwase/zero_inference_torch_version
olruwase/zero_offload_e2e
olruwase/zero_offload_fix_corner_case
olruwase/zero_offload_v3
olruwase/zero_optional_reduce_scatter
olruwase/zero_stage1_checkpoint_layout
olruwase/zero_stage1_elastic_checkpoint
olruwase/zero1_non_tensor_checkpoint
olruwase/zero2_grad_accum_bug
olruwase/zero2_offload_keyerror
olruwase/zero2_offload_rrb_divergence
olruwase/zero2_offload_slowdown
olruwase/zero2_trainable_parameters_v0.5.7
olruwase/zero2_trainable_parameters
olruwase/zero2_unbalanced_grad_reduction
olruwase/zero3_amp_autocast
olruwase/zero3_broken_tracing
olruwase/zero3_dp_norm_allreduce
olruwase/zero3_profile_fetch
olruwase/zero3_unboundlocal_bug
olruwase/zinf_none_swapper
paper
patch-z1-cont-grad
pr_moe_tutorial
preserve-CVDs
profiler-add-shape
qanthony/bigbird
qanthony/comms-bench
qanthony/nccl-backend
quantization-refresh
quantize-inference
refine-quantizer
remotes/origin/dev/tput
remove-tbx
remove-unused-quantize-settings
reyazda/adam-scalar-fix
reyazda/cpu_adam_jit_v2
reyazda/fix-inference-api
reyazda/pytorch-workspace-allocate
reyazda/remove_bertid
reyazda/support_AVX2_by_default
reyazda/test-hidden-dimension
reyazda/test-sparse
reyazda/test-sparse-v2
reyazda/test-transformer
reyazda/testing_embedding
reyazda/triton-new-sparse
reza/deepspeed_adam_merge_v3
reza/fix_adam_corner_case
reza/fix_adam_perf
reza/fix-adam-copyfp16
reza/megatron_kernel_integration
rtd-staging
saforem2/fix-missing-packages
saforem2/ucp-bug
saksham-zero1-fixes
samyam-overlap-comm
samyamr/elasticity
samyamr/fix-for-fragmented-linear-inputs
samyamr/gpt3-finetuning
samyamr/gpt3-finetuning-mixed-precision
samyamr/stage3-alignment-fix
samyamr/zero-2-debug
security-patch
shaden/textgen
smartreply_hotfix
sp/comm-opt
sp-mpu
sparse-attn/support-latest-triton
sparse-attn-cuda11
staging-amd
staging-amd-port
staging-amd-v2
staging-amd-v3
staging-comms-next-v2
staging-comms-v1
staging-deepnvme-gds-v1
staging-demo-feature-v0
staging-ds-chat-blog-v1
staging-ds-seq-v1
staging-inference-v2-5
staging-mii-update
staging-moe-next-v1
staging-oaas
staging-pld-v1
staging-pp
staging-test
staging-zero-dual-v2
staging-zero-dual-v3
staging-zero-dual-v5
staging-zero-inference-v1
stale-issues
stas00-dist-init-device-id
stas00-patch-2
stas00-patch-3
styoun/triton2.1-autotune
styoun/triton2.1
styoun/triton-flash2
styoun/zero-inf-8bit-q
subprocess-test
test-ac
test-cuda-11.7
tjruwase/modal_ci
tmp
tmp-old
tohtana/add_slides_meetup_japan
tohtana/allocate_test_port
tohtana/autocast_only_floating_values
tohtana/bcast_input
tohtana/bcast_warning_z3
tohtana/blog_win_jp
tohtana/cache_kv_requirements
tohtana/clean_after_test
tohtana/clean_all_param_coordinators
tohtana/clean_up_prefetch_param
tohtana/compile_no_grad
tohtana/compile-zero
tohtana/consistent_zero_grad
tohtana/dc_offload_debug
tohtana/debug_compile_backends
tohtana/debug_semaphore_leak
tohtana/deepcompile_fix_scheduling
tohtana/deepcompile_fix_selective_gather
tohtana/file_store_for_tests
tohtana/fix_bf16_opt_update_hp
tohtana/fix_chkpt_alignment
tohtana/fix_sort_dp_univ_ckpt
tohtana/fix_univ_chkpt_load
tohtana/fix_zero_init_patch
tohtana/fix-save-checkpoint-step
tohtana/get_offload_state_api
tohtana/lock_hf_cache_update
tohtana/log_run_tests
tohtana/merge_FPDT
tohtana/model_declaration_in_init_context
tohtana/offload_zero_buffers
tohtana/pipeline_with_compiled_module
tohtana/remove_step_on_init
tohtana/simplify_param_coordinator
tohtana/test_with_pt25
tohtana/univ_ckpt_custom_shape
tohtana/z3_multi_dtypes
tohtana/z3_no_mixed_precision
tohtana/zero1_native_rs
token-drop
transformer/fix-layer-norm
transformer/injection
transformer/large-seq-support
transformer/triangular-mask
transformer-injection
transformer-kernel/support-arbitrary-hidden
triton-fix
ucp_blog
ulysses-offload-tutorial
ulyssess-offload-blog
umchand/test_compiler
umchand/triton/bias_act
unify-benchmark-knowledge
update-flops-profiler-doc
update-flops-profiler-pool-compute
workaround-zero3
z1-offload-multigpu
z3-mem-leak
zero-ckpt-cpu-issue-v2
zhenyzhang-data
zheweiyao/quantize_update
zhipeng/tpu-xla
zhipwang_dev
Switch hasattr to check for compiler and not compile since compile was introduced in torch 2.0 but compiler was introduced in torch 2.1, this fixes issues for those building with torch 2.0
loadams
committed
1 year ago
5ce448d3
[xs] fix ZEROPP convergence test (#5061)
yundai424
committed
1 year ago
Verified
688239e3
optimize clip_grad_norm_ function (#4915)
mmhab
committed
1 year ago
Verified
961bc856
[NPU] replace 'cuda' with get_accelerator().device_name() (#5095)
minchao-sun
committed
1 year ago
Verified
4f477328
HPU Accelerator: fix supported_dtypes API (#5094)
nelyahu
committed
1 year ago
Verified
b42a4706
Update nv-accelerate to latest torch (#5040)
loadams
committed
1 year ago
Verified
ec49222c
Enable torch.compile with ZeRO (Experimental) (#4878)
tohtana
committed
1 year ago
Verified
c3cfe96b
Add backwards compatibility w/ older versions of diffusers (<0.25.0) (#5083)
lekurile
committed
1 year ago
Verified
e212845e
Update torch version for nv-torch-latest-cpu (#5086)
loadams
committed
1 year ago
Verified
e469e7d9
Revert "Update nv-torch-latest-version"
loadams
committed
1 year ago
55eb78ee
Update nv-torch-latest-version
loadams
committed
1 year ago
889620b0
Stop tracking backward chain of broadcast in initialization (#5075)
tohtana
committed
1 year ago
Verified
5a721de3
Fix verification for ZeRO3 leaf module (#5074)
tohtana
committed
1 year ago
Verified
f02d7bda
Further refactor deepspeed.moe.utils + deepspeed.moe.layer type hints (#5060)
Matthew Hoffman
committed
1 year ago
Verified
9922270f
[doc/1-line change] default stage3_param_persistence_threshold is wrong in the doc (#5073)
ByronHsu
committed
1 year ago
Verified
3e6d6069
Make batch size documentation clearer (#5072)
segyges
committed
1 year ago
Verified
dde64b00
[Zero++ qgZ] Fall back to reduce_scatter if `tensor.numel() % (2 * global_world_size) != 0` (#5056)
ByronHsu
committed
1 year ago
Verified
592325ab
adding hccl to init_distributed function description (#5034)
nelyahu
committed
1 year ago
Verified
2eafe41b
Update import for changes to latest diffusers (#5065)
mrwyattii
committed
1 year ago
Verified
a049370c
load linear layer weight with given dtype (#4044)
polisettyvarma
committed
1 year ago
Verified
567f97b2
Optimize grad_norm calculations by reducing device/host dependency (#4974)
nelyahu
committed
1 year ago
Verified
61daaa1e
Delay reduce-scatter for ZeRO3 leaf modules (#5008)
tohtana
committed
1 year ago
Verified
19e0dc39
[NPU] Change log level to debug (#5051)
CurryRice233
committed
1 year ago
Verified
6de31de7
Fix broken model names in inference CI (#5053)
mrwyattii
committed
1 year ago
Verified
449f9ad0
[doc] update inference related docs from `mp_size` to `tensor_parallel` for TP (#5048)
yundai424
committed
1 year ago
Verified
76ec8b49
MoE type hints (#5043)
Matthew Hoffman
committed
1 year ago
Verified
971d82b5
[NPU] Add NPU to support hybrid engine (#4831)
CurryRice233
committed
1 year ago
Verified
88cca60a
Fix nv-torch-latest-cpu CI (#5045)
mrwyattii
committed
1 year ago
Verified
93e9537d
launcher_helper: enable fds passing (#5042)
YizhouZ
committed
1 year ago
Verified
8f627700
update inference pages to point to FastGen (#5029)
mrwyattii
committed
1 year ago
Verified
24f20ef0
Older