Autotune ZenFlow affinity (#7506)
This PR address the following ZenFlow optimizer core binding issue.
https://github.com/deepspeedai/DeepSpeed/issues/7478
With this PR, ZenFlow optimizer worker would derive its core binding
from deepspeed core binding mechanism. The algorithm is as following:
1. Each DeepSpeed rank get its core binding by using DeepSpeed command
line `--bind_cores_to_rank`, this command would assign each CPU physical
cores to different workers
2. When spawing ZenFlow optimizer worker, DeepSpeed would split current
CPU affinity list into two sublist: pt_affinity and zf_affinity
3. zf_affinity would be used to set affinity of ZenFlow optimizer
worker. pt_affinity would be used to set current pytorch process.
4. By default, one cores is reserved by each pytorch process, the rest
is used by ZenFlow optimizer worker. The number of cores reserved for
pytorch process can be changed by ZenFlow config variable:
`pt_reserved_cores`
---------
Signed-off-by: Guokai Ma <guokai.ma@gmail.com>
Signed-off-by: Ma, Guokai <guokai.ma@intel.com>
Signed-off-by: aeeeeeep <aeeeeeep@proton.me>
Signed-off-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: aeeeeeep <aeeeeeep@proton.me>
Co-authored-by: Zhipeng Wang <zhipeng.rainbowserie@gmail.com>
Co-authored-by: Zhipeng Wang <zwanga@wustl.edu>
Co-authored-by: Peng Du <pedu@linkedin.com>
Co-authored-by: pengdurice <pengduhit@gmail.com>
Co-authored-by: Zhipeng Wang <zhipengbayern@gmail.com>
Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>