[CANN] Support cpu offload optimizer for Ascend NPU (#4568)
Support cpu_adam, cpu_adagrad and cpu_lion optimizer for Ascend NPU. All
these optimizer are running on host, the difference between each backend
is the way to copy params back to device. This commit add a new symbol
called "__ENABLE_CANN__". This symbol can compile code adapted to NPU.
The NPU builder adds the required header files and libraries for
compiling, according to CANN's compilation manual.
Note that there's no FusedLion implementation for NPU, test_cpu_lion
test case should disabled until FusedLion optimizer implemented.
Besides, when NPU is selected as the accelerator, ds_report will show
torch_npu and CANN informations.
With this PR, deepspeed test cases in
[huggingface/accelerate](https://github.com/huggingface/accelerate/tree/main/tests/deepspeed)
are all passed.
It's a part of feature list for Ascend NPU support, @see #4567
---------
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>