Added 4-byte alignment on NCCL/RCCL (#1328)
* Added 4-byte alignment on NCCL/RCCL
* pre-commit formatting fixes
* Fix for checkpoint loading with optimizer partitioning
* Better assert print
* Added unit tests for nccl/rccl 4-byte alignment
* bug
* Updated alignment to implicit
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>