DeepSpeed
ade98365 - DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators (#4833)

Commit

1 year ago

DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators (#4833) The approach till today use the practice where the torch.nn.parameter data is being replaced with a new cpu data storage, to offload device memory. All params are being flatenned on the host and moved to the device. in some accelerators torch.nn.parameter which is a device parameter cannot be assigned with a cpu storage. This PR copy the param data into a new cpu tensor, and shrinks the device storage. Later when the flat buffer is moved to the device param.data will be a view to the flat buffer. --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>

References

#4833 - DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators

Author

nelyahu

Parents

ed10cc73

DeepSpeed ade98365 - DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators (#4833)

DeepSpeed
ade98365 - DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators (#4833)