Update TensorPipe submodule (#40614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40614
This update pulls in a oneliner fix, which sets the TCP_NODELAY option on the TCP sockets of the UV transport. This leads to exceptional performance gains in terms of latency, with about a 25x improvement in one simple benchmark. This thus resolves a regression that TensorPipe had compared to the ProcessGroup agent and, in fact, ends up beating it by 2x.
The benchmark I ran is this, with the two endpoints pinned to different cores of the same machine:
```
torch.jit.script
def remote_fn(t: int):
return t
torch.jit.script
def local_fn():
for _ in range(1_000_000):
fut = rpc.rpc_async("rhs", remote_fn, (42,))
fut.wait()
```
And the average round-trip time (one iteration) is:
- TensorPipe with SHM: 97.2 us
- TensorPipe with UV _after the fix_: 205us
- Gloo: 440us
- TensorPipe with UV _before the fix_: 5ms
Test Plan: Ran PyTorch RPC test suite
Differential Revision: D22255393
fbshipit-source-id: 3f6825d03317d10313704c05a9280b3043920507