Support replicating multi-GPU modules (#18687)
Summary:
If the input `network` resides on multiple GPUs, `devices` must be a 2D list with `devices[0]` matching `network`'s devices. See #18591
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18687
Differential Revision: D14706162
Pulled By: mrshenli
fbshipit-source-id: dca630d3308f2dbcf8b75629c452d7a64092ba42