[DataLoader] Short circuit pin_memory recursion when operating on bytes (#97737)
Slack thread: https://pytorch.slack.com/archives/GEEQ2K4MD/p1679962409906099
I was seeing some massive (~2x) slowdowns on a job after running it on PyTorch 2.0. From some profiling in `py-spy` it looked like the pin_memory thread was doing a lot more work than before. Looking at a trace in `nsys` I saw the thread doing the forward pass having a bunch of `pthread_cond_timedwait` with GIL reacquire calls in it’s call stack, and it seemed like the thread doing the forward pass was getting blocked (waiting for the GIL) by the pin memory thread (which was holding the GIL).
After some debugging I found out the issue. If a `bytes` was passed into `pin_memory`, previously in 1.13 (before https://github.com/pytorch/pytorch/pull/94709) it would short-circuit and return here
https://github.com/pytorch/pytorch/blob/d922c29a22e4bf0fba49526f7536395eb8cd66f4/torch/utils/data/_utils/pin_memory.py#L54-L55
since `bytes` was in `torch._six.string_classes`:
```
>>> from torch._six import string_classes
>>> string_classes
(<class 'str'>, <class 'bytes'>)
>>>
```
However after https://github.com/pytorch/pytorch/pull/94709, if a `bytes` was passed into `pin_memory` it would fall into here instead
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L68-L73
because the previous check is now doing `isinstance(data, str)` instead of `isinstance(data, (str, bytes))`!
https://github.com/pytorch/pytorch/blob/c263bd43e8e8502d4726643bc6fd046f0130ac0e/torch/utils/data/_utils/pin_memory.py#L56-L57
As a result, `pin_memory` gets called recursively for each element in the `bytes` leading to a ton of wasted recursion. This also explains the slowdown / GIL contention I was seeing.
This PR simply changes `isinstance(data, str)` to `isinstance(data, (str, bytes))` to match the behavior before https://github.com/pytorch/pytorch/pull/94709
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97737
Approved by: https://github.com/albanD, https://github.com/NivekT