DeepSpeed
11a62a06 - Add Compressedbackend for Onebit optimizers (#5473)

Commit

2 years ago

Add Compressedbackend for Onebit optimizers (#5473) In the process of adding onebit optimizers support for XPU devices, we have noticed that for different accelerator, the main difference of implementation of `compressed_allreduce` lies on `packbits` and `unpackbits`. CUDA uses cupy and NPU uses torch_npu. Instead of replace these to xpu only functions, we provided a CompressedBackend to do the `compressed_allreduce` work where users can add their own packbits/unpackbits kernels, which is a general path for all kinds of accelerators. In this PR, we: 1. Add CompressedBackend for onebitAdam, onebitLamb and zerooneAdam 2. Add XPU implement of packbits/unpackbits with SYCL, built in PackbitsBuilder 3. Add tests for onebit with CompressedBackend --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

References

#5473 - Add Compressedbackend for Onebit optimizers

Author

Liangliang-Ma

Parents

6b6d6418

DeepSpeed 11a62a06 - Add Compressedbackend for Onebit optimizers (#5473)

DeepSpeed
11a62a06 - Add Compressedbackend for Onebit optimizers (#5473)