Test for distributed RL with RPC (#52393)
Summary:
Addresses one item in https://github.com/pytorch/pytorch/issues/46321
## Background
This is a test version of the RL RPC example defined [here](https://github.com/pytorch/examples/blob/master/distributed/rpc/rl/main.py) and [here](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html), with the following differences:
* It defines and uses a `DummyEnv` to avoid a dependency on `gym`. The `DummyEnv` simply returns random states & rewards for a small number of iterations.
* It removes the `ArgumentParser` and utilizes `RpcAgentTestFixture` + hard-coded constants for configuration and launching.
* It changes the worker names to match what the internal Thrift RPC tests expect.
The code is purposefully kept very similar to the original example code outside of these differences.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52393
Test Plan:
```
pytest test/distributed/rpc/test_tensorpipe_agent.py -k test_rl_rpc -vs
pytest test/distributed/rpc/test_process_group_agent.py -k test_rl_rpc -vs
```
Reviewed By: glaringlee
Differential Revision: D26515435
Pulled By: jbschlosser
fbshipit-source-id: 548548c4671fe353d83c04108580d807108ca76e