Make the quantized data shape compatible with original tensor shape #5483
Make the quantized data shape compatible with original tensor shape
085fcd8b
change the scale and quantized data format
a83b384f
minor fixes
048648dd
fix
bf128934
minor fix
b18f71f6
Merge branch 'master' into fix-quantized-shape
4d6e04ba
more fixed
f9244558
Merge branch 'fix-quantized-shape' of https://github.com/Snowflake-La…
e03c0f49
Improve _configure_optimizer() final optimizer log (#5528)
d9cfba6e
Enhance testing: Skip fused_optimizer tests if not supported. (#5159)
2bbc6806
Skip the UT cases that use unimplemented op builders. (#5372)
b3ab6265
rocblas -> hipblas changes for ROCm (#5401)
4494c86c
Rocm warp size fix (#5402)
2c0dcac5
Optimize zero3 fetch params using all_reduce (#5420)
f53895fe
CPUAdam fp16 and bf16 support (#5409)
bb146c33
Fix the TypeError for XPU Accelerator (#5531)
31f11c05
Fix RuntimeError for moe on XPU: tensors found at least two devices (…
35b48135
Remove synchronize calls from allgather params (#5516)
cf0ccb5a
Avoid overwrite of compiled module wrapper attributes (#5549)
e388056c
Small typos in functions set_none_gradients_to_zero (#5557)
5ff0d446
Adapt doc for #4405 (#5552)
29ab009b
Update to HF_HOME from TRANSFORMERS_CACHE (#4816)
633da3d9
[INF] DSAttention allow input_mask to have false as value (#5546)
9db010e5
Add throughput timer configuration (#5363)
bd2b2ef1
Add Ulysses DistributedAttention compatibility (#5525)
3c5aa00a
Add hybrid_engine.py as path to trigger the DS-Chat GH workflow (#5562)
d7f9be61
Update HPU docker version (#5566)
c160d76a
[MiCS] Remove the handle print on DeepSpeed side (#5574)
c203830f
Rename files in fp_quantize op from quantize.* to fp_quantize.* (#5577)
5e5c8a7c
Update to fix sidebar over text (#5567)
ff01ade2
DeepSpeedCheckpoint: support custom final ln idx (#5506)
83920f6f
Update minor CUDA version compatibility (#5591)
a6076cf1
Add slide deck for meetup in Japan (#5598)
9db99709
Fixed the Windows build. (#5596)
c6f151c1
estimate_zero2_model_states_mem_needs: fixing memory estiamtion (#5099)
0bf35115
Fix cuda hardcode for inference woq (#5565)
cca53b0b
fix sequence parallel(Ulysses) grad scale for zero0 (#5555)
31815d9c
Add Compressedbackend for Onebit optimizers (#5473)
6ad125e4
Updated hpu-gaudi2 tests content. (#5622)
9c15b8f7
Pin transformers version for MII tests (#5629)
2e4bc1d3
WA for Torch-compile-Z3-act-apt accuracy issue from the Pytorch repo …
e5b4d414
stage_1_and_2: optimize clip calculation to use clamp (#5632)
8a4d03c7
Fix overlap communication of ZeRO stage 1 and 2 (#5606)
5e5b1f7b
Merge branch 'master' of https://github.com/Snowflake-Labs/deepspeed …
c47ad5f9
remove float8 dtype
277902a1
Merge branch 'master' into fix-quantized-shape
74311af7
Merge branch 'master' into fix-quantized-shape
9eb12fbe
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub