Clean Up ZeRO (#60285)
Summary:
**Overview:**
Being relatively new to PyTorch and ZeRO, I found parts of the code slightly hard to follow. This change strives to clean up the `ZeroRedundancyOptimizer` code in `zero_redundancy_optimizer.py` by reorganizing some computations, making variable names more explicit and consistent, and unifying terminology in the documentation. The goal is for the code to be easier to extend afterwards.
**Changes:**
1) `state_dict()`: The [logic](https://github.com/pytorch/pytorch/blob/85517a2b700a5abc0b38f53ce8c99404cd67db79/torch/distributed/optim/zero_redundancy_optimizer.py#L510) for updating the global `state_dict` with each rank's local `state_dict` is simplified and made more explicit. Notably, the `dict` [`local_index_to_param_id`](https://github.com/pytorch/pytorch/blob/85517a2b700a5abc0b38f53ce8c99404cd67db79/torch/distributed/optim/zero_redundancy_optimizer.py#L513) is unneeded. It maps `local_pg["params"][i]` to `id(global_pg["params"][i])`, so it is equivalent to make a single pass over both lists in tandem, effectively iterating over `i`, without a need for the explicit `dict`.
2) `_update_trainable()`: The function [initializes](https://github.com/pytorch/pytorch/blob/85517a2b700a5abc0b38f53ce8c99404cd67db79/torch/distributed/optim/zero_redundancy_optimizer.py#L597) the local optimizer if it does not exist. I am unaware of any reason for the local optimizer to be destroyed after initialization, so I moved that logic to its own function `_init_local_optimizer()`, which is called once in the constructor.
After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654706728), I removed the function `_update_trainable()` itself in favor of adding a check for `parameters_as_bucket_view` in `build_param_buckets()` directly.
3) `rank_local_state_dict()`: This [function](https://github.com/pytorch/pytorch/blob/85517a2b700a5abc0b38f53ce8c99404cd67db79/torch/distributed/optim/zero_redundancy_optimizer.py#L528) is currently broken. It appears to be legacy and relies on the input `state_dict` to have the key `"partitions"`. For now, I have removed it and added an [issue](https://github.com/pytorch/pytorch/issues/60284). Is it a notable use case to want to access another rank's `state_dict` in particular (as opposed to consolidating the entire state and then accessing)?
4) `local_state_dict():` After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r655571043), I removed the function.
5) `partition_parameters()`: After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654708183), I renamed the function to `_partition_parameters()` to mark it as private.
6) `_param_to_index`: After [discussion](https://github.com/pytorch/pytorch/pull/60285#discussion_r654828100), I changed the key to be the parameter itself rather than its integer ID.
7) `buckets`: I renamed the data structure to `_buckets` to mark it as private.
8) Terminology: I tried to reduce the set of terms being used instead of juggling a number of synonyms. In particular, I made an effort to distinguish between "local" and "global" and to make names more indicative of typing.
9) Style: Per the [PyTorch contributing guide](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#writing-documentation), I made all docstrings abide by the 80 character limit, except for the one [line](https://github.com/andwgu/pytorch/blob/554891f6faa764c76dec4afb1107cb5aa88ef589/torch/distributed/optim/zero_redundancy_optimizer.py#L142) showing the example ZeRO usage. Some code lines violate the limit for readability. Also, I unified some of the minor stylistic usages out of habit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60285
Test Plan:
The test suite passes as expected (on the AI AWS cluster):
```
gpurun python test/distributed/optim/test_zero_redundancy_optimizer.py
```
I visually inspected the generated HTML doc (as generated following [this](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md#writing-documentation)).
Reviewed By: mrshenli
Differential Revision: D29320726
Pulled By: andwgu
fbshipit-source-id: 23f69a19ecc5e877a38fe1df0da11329428311dd