rebuild_storage_fd retry on EINTR (#21723)
Summary:
Some data loader tests are flaky on py 2 with the following error
```
Jun 12 22:17:31 Traceback (most recent call last):
Jun 12 22:17:31 File "test_dataloader.py", line 798, in test_iterable_dataset
Jun 12 22:17:31 fetched = sorted([d.item() for d in dataloader_iter])
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 697, in __next__
Jun 12 22:17:31 idx, data = self._get_data()
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 664, in _get_data
Jun 12 22:17:31 success, data = self._try_get_data()
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 617, in _try_get_data
Jun 12 22:17:31 data = self.data_queue.get(timeout=timeout)
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/queues.py", line 135, in get
Jun 12 22:17:31 res = self._recv()
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/queue.py", line 22, in recv
Jun 12 22:17:31 return pickle.loads(buf)
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1382, in loads
Jun 12 22:17:31 return Unpickler(file).load()
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 858, in load
Jun 12 22:17:31 dispatch[key](self)
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/pickle.py", line 1133, in load_reduce
Jun 12 22:17:31 value = func(*args)
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/site-packages/torch/multiprocessing/reductions.py", line 274, in rebuild_storage_fd
Jun 12 22:17:31 fd = multiprocessing.reduction.rebuild_handle(df)
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 157, in rebuild_handle
Jun 12 22:17:31 new_handle = recv_handle(conn)
Jun 12 22:17:31 File "/opt/python/2.7.9/lib/python2.7/multiprocessing/reduction.py", line 83, in recv_handle
Jun 12 22:17:31 return _multiprocessing.recvfd(conn.fileno())
Jun 12 22:17:31 OSError: [Errno 4] Interrupted system call
```
Apparently, Python 2.7's `recvfd` calls `recvmsg` without EINTR retry: https://github.com/python/cpython/blob/2.7/Modules/_multiprocessing/multiprocessing.c#L174
So we should call it with an outer try-catch loop.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21723
Differential Revision: D15806247
Pulled By: ezyang
fbshipit-source-id: 16cb661cc0fb418fd37353a1fef7ceeb634f02b7