NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (#594)
* NCCL based 1-bit Implementation + Refactor to add communication backends (#593)
* add nccl 1-bit optim.
* temporary commit to save stuff.
* Use dist collectives instead of mpi routines.
* remove old code for comm.
* Fix bugs. still does not work.
* modify to test the nccl side code path
* Initial gather impl. Works intra-node.
* Updates to comm. phase 2. nccl comm. passed the tests.
* refactor code to introduce nccl/mpi as backends for onebit adam.
* Refactor updates to test/engine.
* Fix compile/runtime errors.
* simplify support for nccl/mpi backends.
* Add missign file
* Add compression backend in constructor. Revert later.
* modify test with some perf counting.
* Implement a true non-blocking gather for nccl side.
* Revert "Add compression backend in constructor. Revert later."
This reverts commit df8c40d3105e9f2542a8aa6619e80d675a09753f.
* improve the 1-bit adam test.
* Refactor comm. and compression backend in 1-bit adam.
* Fix the test.
* Fix runtime errors and typos in nccl backend
* fix mpi backend. modify tests.
* modify nccl perf test.
* fix mpi side errors.
* Add an mpi perf test
* Sync DSE.
* Remove old collectives file.
* Undo a typo.
* Graceful failure for torch versions that don't support nccl pt2pt.