Reduce amount of work done within a global lock within ParallelLoadOp (#43508)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43508
Differential Revision: D22952007
fbshipit-source-id: 11e28d20175271e6068edce8cb36f9fcf867a02a