DeepSpeed
f03d416e - add --bind_cores_to_rank to zero offload tutorial (#7474)

Commit
164 days ago
add --bind_cores_to_rank to zero offload tutorial (#7474) In ZeRO offload, significant time is spent on CPUAdam, which is CPU code. Thus use `--bind_cores_to_rank` in deepspeed launch command would help improve the performance of ZeRO offload. This PR add this command to ZeRO offload tutorial to increase user awareness. For Qwen2.5-3B finetuning on 2 A100-40B cards, running on CPU host with 128 CPU cores, the average step time is as follow, near 1.3x performance improvement: without `--bind_cores_to_rank`: 3084.44ms per step with `--bind_cores_to_rank`: 2383.16ms per step --------- Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com>
Author
Parents
Loading