DeepSpeed
d40cf466 - Avoid graph break by removing redundant requires_grad attr change (#7158)

Commit

314 days ago

Avoid graph break by removing redundant requires_grad attr change (#7158) This PR is a continuation of the efforts to improve DeepSpeed performance when using PyTorch compile. Dynamo breaks the graph because `flat_tensor.requires_grad = False`: * Is a side-effecting operation on tensor metadata * Occurs in a context where Dynamo expects static tensor properties for tracing `flat_tensor.requires_grad` is redundant and can be safely removed because: * `_allgather_params()` function is already decorated with `@torch.no_grad()` which ensures the desired property * `flat_tensor` is created using the `torch.empty()` which sets the `requires_grad=False` by default. --------- Signed-off-by: Max Kovalenko <mkovalenko@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com>

References

#7158 - Avoid graph break by removing redundant requires_grad attr change

Author

deepcharm

Parents

1ca83a6b

DeepSpeed d40cf466 - Avoid graph break by removing redundant requires_grad attr change (#7158)

DeepSpeed
d40cf466 - Avoid graph break by removing redundant requires_grad attr change (#7158)