pytorch
3dd26630 - Fix inaccurate note in DistributedDataParallel (#47156)

Commit View On GitHub

Commit

3 years ago

Fix inaccurate note in DistributedDataParallel (#47156) Summary: Sorry for my previous inaccurate [PR](https://github.com/pytorch/pytorch/pull/42471#issue-462329192 ). Here are some toy code to illustrate my point: * non-DistributedDataParallel version ```python import torch if __name__ == "__main__": torch.manual_seed(0) inp = torch.randn(1,16) inp = torch.cat([inp, inp], dim=0) model = torch.nn.Linear(16, 2) loss_func = torch.nn.CrossEntropyLoss() opti = torch.optim.SGD(model.parameters(), lr=0.001) opti.zero_grad() loss = loss_func(model(inp), torch.tensor([0, 0])) loss.backward() opti.step() print("grad:", model.weight.grad) print("updated weight:\n", model.weight) ``` * DistributedDataParallel version ```python import os import torch import torch.nn as nn import torch.distributed as dist from torch.multiprocessing import Process def run(rank, size): torch.manual_seed(0) x = torch.randn(1,16) model = torch.nn.Linear(16, 2) model = torch.nn.parallel.DistributedDataParallel(model) loss_func = torch.nn.CrossEntropyLoss() opti = torch.optim.SGD(model.parameters(), lr=0.001) opti.zero_grad() y = model(x) label = torch.tensor([0]) loss = loss_func(y, label) loss.backward() opti.step() if rank == 0: print("grad:", model.module.weight.grad) print("updated weight:\n", model.module.weight) def init_process(rank, size, fn, backend="gloo"): os.environ['MASTER_ADDR'] = '127.0.0.1' os.environ['MASTER_PORT'] = '29500' dist.init_process_group(backend, rank=rank, world_size=size) fn(rank, size) if __name__ == "__main__": size = 2 process = [] for rank in range(size): p = Process(target=init_process, args=(rank, size, run)) p.start() process.append(p) for p in process: p.join() ``` Both of these two pieces of code have the same output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47156 Reviewed By: mruberry Differential Revision: D24675199 Pulled By: mrshenli fbshipit-source-id: 1238a63350a32a824b4b8c0018dc80454ea502bb

Author

0x4f5da2

Committer

facebook-github-bot

Parents

8b3f1d12

pytorch 3dd26630 - Fix inaccurate note in DistributedDataParallel (#47156)

Commit

pytorch
3dd26630 - Fix inaccurate note in DistributedDataParallel (#47156)