[XPU] [DeepNVMe] use same cpu_op_desc_t with cuda (#6645)
We have found that #6592 uses `_pinned_tensor_mgr` to create cpu bounce
buffer, which is same with what our xpu accelerator currently doing.
So no need to use xpu device specific cpu_op_desc_t.
In this PR:
1. remove custom csrc/xpu/aio/deepspeed_cpu_op.cpp
2. modify xpu async_io opbuilder.
This issue cannot be easily done with revert #6532 , for we added some
source file as last time GDS feature going in DS. So file this new PR :)