DeepSpeed
b70e8629 - fix(zero): detach flat buffer to prevent autograd inplace error on CPU accelerator

Commit

45 days ago

fix(zero): detach flat buffer to prevent autograd inplace error on CPU accelerator The on-device flatten path (introduced in #7828) passes nn.Parameter objects with requires_grad=True to torch.cat(), creating a flat buffer with CatBackward0 grad_fn. Later, _unflatten_dense_tensors produces SplitBackward0 views that are assigned to model params. Inplace copy_() on these views during optimizer step raises: RuntimeError: Output 0 of SplitBackward0 is a view and is being modified inplace. This especially affects CPU training where CPU_Accelerator.is_available() returns True and available_memory() returns system RAM, so the on-device path is always taken. Fix: add .detach() to the flattened buffer, matching the implicit detach behavior of the CPU-offload path (param.data.cpu() + .to(device)). Also rename flatten_on_gpu -> flatten_on_accelerator and replace GPU-specific terminology in comments/logs with accelerator-generic equivalents. Signed-off-by: Guokai Ma <guokai.ma@intel.com>

References

#7948 - fix(zero): detach flat buffer to prevent autograd inplace error on CP…

Author

delock

Committer

delock

Parents

abb88ceb

DeepSpeed b70e8629 - fix(zero): detach flat buffer to prevent autograd inplace error on CPU accelerator

DeepSpeed
b70e8629 - fix(zero): detach flat buffer to prevent autograd inplace error on CPU accelerator