[C10D] Rewrite TCPStore client send path to minimize amount of syscalls. (#100742)
Accumulate data in a local buffer prior to sending it. This reduces
the number of syscalls and network packets.
We flush every 1440 bytes to cap the amount of temporaty memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100742
Approved by: https://github.com/fduwjj