SemanticDiff

pytorch
10271a60 - [FSDP] Skip `_use_sharded_views()` for `SHARD_GRAD_OP` (#98250)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

1 year ago

[FSDP] Skip `_use_sharded_views()` for `SHARD_GRAD_OP` (#98250) This PR has `SHARD_GRAD_OP` (and `_HYBRID_SHARD_ZERO2`) skip `_use_sharded_views()` in the post-forward reshard since the strategy does not free the unsharded flat parameter and can preserve the unsharded views. This saves nontrivial CPU overhead both in the post-forward reshard (`_use_sharded_views()`) and the pre-backward unshard (`_use_unsharded_views()`). <details> <summary>(Before) Pre-backward hook: 4.356 ms</summary> <img width="812" alt="Screenshot 2023-04-03 at 6 32 19 PM" src="https://user-images.githubusercontent.com/31054793/229641309-778cf1f9-4b5b-42ec-b2d8-0a1e6e7ce330.png"> </details> <details> <summary>(After) Pre-backward hook: 1.044 ms</summary> ![Screenshot 2023-04-04 at 9 05 53 AM](https://user-images.githubusercontent.com/31054793/229800917-9580ce6b-3721-469a-9212-f0cbfd8cbb52.png) </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98250 Approved by: https://github.com/rohan-varma

Author

awgu

awgu

Committer

pytorchmergebot

pytorchmergebot

Parents

FAQ Terms Privacy Refunds Impressum

Loading