[1.6] Update TensorPipe submodule (#40634)
Upstream PR: #40614
Summary:
This update pulls in a oneliner fix, which sets the TCP_NODELAY option on the TCP sockets of the UV transport. This leads to exceptional performance gains in terms of latency, with about a 25x improvement in one simple benchmark. This thus resolves a regression that TensorPipe had compared to the ProcessGroup agent and, in fact, ends up beating it by 2x.
The benchmark I ran is this, with the two endpoints pinned to different cores of the same machine:
```
torch.jit.script
def remote_fn(t: int):
return t
torch.jit.script
def local_fn():
for _ in range(1_000_000):
fut = rpc.rpc_async("rhs", remote_fn, (42,))
fut.wait()
```
And the average round-trip time (one iteration) is:
- TensorPipe with SHM: 97.2 us
- TensorPipe with UV _after the fix_: 205us
- Gloo: 440us
- TensorPipe with UV _before the fix_: 5ms
Test Plan: Ran PyTorch RPC test suite