only scatter in forward if multi-device per process (#22384)
Summary:
Scatter is unnecessary if only using one device, and it breaks on some custom data structures like namedtuple, so would like to avoid :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22384
Differential Revision: D16428208
Pulled By: soumith
fbshipit-source-id: eaa3876b2b95c1006ccaaacdb62f54c5280e730c