Samyamr/inference hook fix (#851)
* Fix mis-aligned-grad
When a parameter is not divisible by world size, the partitioned gradients are mis-aligned due to incorrect padding handling. This PR should fix for that.
* Formatting fix
* Adding static_scale test back for Z3, and also changing hidden size to be not divisile by world_size
* also removing alignment from flat fp16 buffers
* Testing for hidden dim alignment
* inference hook fix
* Update stage3.py
* formatting
* [bug-fix] move params to gpu if offload params is turned off
Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Jeff Rasley <jerasley@microsoft.com>