DeepSpeed
f88d0f8d - Fix async_io ops building error on Huawei Ascend NPU (#7894)

Commit
20 days ago
Fix async_io ops building error on Huawei Ascend NPU (#7894) ### Summary Fixes async_io ops building error on Huawei Ascend NPU. ### Environment | Item | Version | | -------------------------- | ------------------ | | kernel version | 5.15.0-101-generic | | torch version | 2.8.0+cpu | | deepspeed info | 0.18.7 | | deepspeed wheel compiled w | torch 2.8 | | torch_npu version | 2.8.0 | | ascend_cann version | 8.1.RC1 | Deepspeed config "zero_optimization.offload_optimizer.device" = "nvme" (device = "cpu" works). ### Error Messages When offloading from NPU to NVME, error occurs: ```text ImportError: /.../async_io.so: undefined symbol: _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl ``` nm tells that the symbol is declared but not defined, but it's found at "csrc/aio/py_lib/deepspeed_py_io_handle.cpp": ```sh nm async_io.so | rg _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl # U _ZN21deepspeed_io_handle_t18_create_io_op_descEbRKN2at6TensorEiPKcbl ``` # Solution 1. `op_builder/npu/async_io.py`: ```python class AsyncIOBuilder(NPUOpBuilder): def sources(self): return [ 'csrc/aio/py_lib/deepspeed_py_copy.cpp', 'csrc/aio/py_lib/py_ds_aio.cpp', 'csrc/aio/py_lib/deepspeed_py_aio.cpp', 'csrc/aio/py_lib/deepspeed_py_aio_handle.cpp', 'csrc/aio/py_lib/deepspeed_aio_thread.cpp', 'csrc/aio/common/deepspeed_aio_utils.cpp', 'csrc/aio/common/deepspeed_aio_common.cpp', 'csrc/aio/common/deepspeed_aio_types.cpp', 'csrc/aio/py_lib/deepspeed_pin_tensor.cpp', # Adds 3 source files: 'csrc/aio/py_lib/deepspeed_py_io_handle.cpp', 'csrc/aio/py_lib/deepspeed_aio_op_desc.cpp', 'csrc/aio/py_lib/deepspeed_cpu_op.cpp' ] ``` 2. `csrc/aio/py_lib/deepspeed_cpu_op.cpp`: ```cpp #if defined(__ENABLE_CANN__) // `DS_BUILD_OPS=1 install.sh` complains that ‘torch_npu’ has not // been declared, so inlines `torch_npu::utils::is_npu`. if (_buffer.is_privateuseone()) { auto device = at::Device("npu:0"); _buffer.copy_(_cpu_buffer.to(device)); } #endif ``` Signed-off-by: Huang Yifan <yifan0610@foxmail.com> Co-authored-by: Olatunji Ruwase <tunji.ruwase@snowflake.com>
Parents
Loading