CUDA optional deepspeed ops (#2507)
* CPU-Adam: add compile-flag to enable param-copy from CPU to GPU
* guarde the CUDA-related include files and variables
* remove CUDA dependency from op_builder when building against CPU
* fixing the builder issues
* fix formatting
* return true when there is no mismatch on the cuda version
* guard for when cuda is not available & test with cpu-only environment
* Update cpu_adam and cpu_adagrad
* Format fixes
* Add configurable half precision type; Build/run in CUDA environment
* Run cpu_adam and cpu_adagrad in cpu only environment
* Mark CUDA only unit tests
* CPU environment CI
* Format fixes
* Remove --forked
* Add --forked
* CPU only CI should pass
* Format fixes
* Format fixes
* Remove scattered pytest.skip
* Fix cpu_adam unit test
* Update .github/workflows/nv-torch-latest-cpu.yml
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* Update .github/workflows/nv-torch-latest-cpu.yml
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
* Address PR feedback
* OpenMP linking
* Fix unit tests
Co-authored-by: Reza Yazdani <reyazda@microsoft.com>
Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>