Disable moco on CPU (#838)
Summary:
When I run this model I get:
```
$ ./torchbench.py --nothing -k moco
Traceback (most recent call last):
File "./torchbench.py", line 1021, in <module>
main()
File "./torchbench.py", line 912, in main
for device, name, model, example_inputs in iter_models(args):
File "./torchbench.py", line 112, in iter_models
yield load_model(device, model_name, args.training, args.check_accuracy)
File "./torchbench.py", line 140, in load_model
benchmark = benchmark_cls(test="eval", device=device, jit=False)
File "/home/jansel/torchbenchmark/torchbenchmark/util/model.py", line 14, in __call__
obj = type.__call__(cls, *args, **kwargs)
File "/home/jansel/torchbenchmark/torchbenchmark/models/moco/__init__.py", line 65, in __init__
self.model = torch.nn.parallel.DistributedDataParallel(
File "/home/jansel/pytorch/torch/nn/parallel/distributed.py", line 574, in __init__
self._log_and_throw(
File "/home/jansel/pytorch/torch/nn/parallel/distributed.py", line 676, in _log_and_throw
raise err_type(err_msg)
ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device None, and module parameters {device(type='cpu')}.
```
If I comment out the DistributedDataParallel line, I get:
```
Traceback (most recent call last):
File "./torchbench.py", line 1021, in <module>
main()
File "./torchbench.py", line 915, in main
run_one_model(
File "./torchbench.py", line 971, in run_one_model
correct_result = model_iter_fn(copy.deepcopy(model), example_inputs)
File "./torchbench.py", line 471, in forward_pass
return mod(*inputs)
File "/home/jansel/pytorch/torch/nn/modules/module.py", line 1111, in _call_impl
return forward_call(*input, **kwargs)
File "/home/jansel/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 133, in forward
im_k, idx_unshuffle = self._batch_shuffle_ddp(im_k)
File "/home/jansel/pytorch/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/jansel/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 76, in _batch_shuffle_ddp
x_gather = concat_all_gather(x)
File "/home/jansel/pytorch/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/jansel/torchbenchmark/torchbenchmark/models/moco/moco/builder.py", line 172, in concat_all_gather
torch.distributed.all_gather(tensors_gather, tensor, async_op=False)
File "/home/jansel/pytorch/torch/distributed/distributed_c10d.py", line 2062, in all_gather
work = default_pg.allgather([tensor_list], [tensor])
```
It seems like this are not configured properly
Pull Request resolved: https://github.com/pytorch/benchmark/pull/838
Reviewed By: xuzhao9
Differential Revision: D35272714
Pulled By: jansel
fbshipit-source-id: e543f42662ad5b9c413d7d04a7a627201104b75c