pytorch
bf4a2817 - Retry connecting to TCP store on ECONNRESET (#25707)

Commit
6 years ago
Retry connecting to TCP store on ECONNRESET (#25707) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25707 The retry logic dealt with ECONNREFUSED to deal with the client being started before the server. It didn't yet deal with the server being started but having its listen backlog exhausted. This may happen when starting many processes that all try to connect at the same time. The server implementation uses blocking I/O to read and write entire messages, so it may take a bit longer to call `accept(2)` on new connections compared to a fully event driven approach. This commit both increases the default listen backlog on the server side and implements retries on ECONNRESET after `connect(2)`. Test Plan: Imported from OSS Differential Revision: D17226958 Pulled By: pietern fbshipit-source-id: 877a7758b29286e06039f31b5c900de094aa3100
Author
Parents
Loading