Add devices to TensorPipe options (#56405)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56405
If not provided, the `devices` field will be initialized to local
devices in local `device_maps` and corresponding devices in peers'
`device_maps`. When processing CUDA RPC requests, the agent will
use a dedicated stream for each device in the devices list to 1)
accept argument CUDA tensors 2) run user functions 3) send return
value tensors.
closes #54017
Test Plan: Imported from OSS
Reviewed By: lw
Differential Revision: D27863133
Pulled By: mrshenli
fbshipit-source-id: 5d078c3b6d1812f85d62b0eb0f89f2b6c82cb060