Release GIL when doing shared memory copies on Tensors (#85389)
See discussion here for context: https://pytorch.slack.com/archives/GEEQ2K4MD/p1663672716533319?thread_ts=1662155536.133099&cid=GEEQ2K4MD, opening a PR as suggested by @albanD
Currently PyTorch holds the GIL when copying Tensors into shared memory. For certain workloads it would be nice to be able to copy different tensors into shared memory in parallel, but with the GIL being held the copies can't truly run in parallel.
Here's a short example of this:
```
import torch
import time
from multiprocessing.pool import ThreadPool
tensors = []
for i in range(64):
for j in range(8):
t = torch.ones(128, 480, 640).type(torch.uint8) * i
tensors.append(t)
print("Done generating input tensors")
with ThreadPool(processes=8) as pool:
futures = []
before = time.time()
for t in tensors:
future = pool.apply_async(t.share_memory_)
futures.append(future)
for f in futures:
f.get()
elapsed = time.time() - before
print("ELAPSED TIME", elapsed)
```
With this diff, I get:
```
~$ python repro.py
Done generating input tensors
ELAPSED TIME 3.561321258544922
~$
```
Previously, I would get:
```
~$ python repro.py
Done generating input tensors
ELAPSED TIME 16.305657386779785
~$
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85389
Approved by: https://github.com/albanD