fix(zero): detach flat buffer to prevent autograd inplace error on CP… (#7948)
…U accelerator
The on-device flatten path (introduced in #7828) passes nn.Parameter
objects with requires_grad=True to torch.cat(), creating a flat buffer
with CatBackward0 grad_fn. Later, _unflatten_dense_tensors produces
SplitBackward0 views that are assigned to model params. Inplace copy_()
on these views during optimizer step raises:
RuntimeError: Output 0 of SplitBackward0 is a view and is being modified
inplace.
This especially affects CPU training where
CPU_Accelerator.is_available() returns True and available_memory()
returns system RAM, so the on-device path is always taken.
Fix: add .detach() to the flattened buffer, matching the implicit detach
behavior of the CPU-offload path (param.data.cpu() + .to(device)).
Also rename flatten_on_gpu -> flatten_on_accelerator and replace
GPU-specific terminology in comments/logs with accelerator-generic
equivalents.
---------
Signed-off-by: Guokai Ma <guokai.ma@intel.com>
Signed-off-by: Ma, Guokai <guokai.ma@gmail.com>