[DataLoader] Close byte stream explicitly (#58938)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58938
When run `test_datapipe.py`, python `gc` would report lots of `ResourceWarning`s due to unclosed stream. It's not only annoying, there are two potential problems:
- Performance regression because `gc` requires additional memory and computation to track reference
- Python `gc` runs periodically so we many encountered an error of too many open files due to OS limitation
To reduce the warning:
- Explicitly close byte stream
- Modify `test_datapipe.py` to use context manager
Small fix:
- Reorder import in `test_datapipe.py`
Further investigation:
Can we directly use context manager in `LoadFileFromDisk` and `ReadFileFromTar` to eliminate this Error?
- Probably no. It's feasible only if the pipeline is synchronized and without prefetching. When we enable these two features, the scope guard of the context manager doesn't work.
- We may need to implement some reference counter attached to these file byte stream to close by itself.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D28689862
Pulled By: ejguan
fbshipit-source-id: bb2a85defb8a4ab5384db902ef6ad062185c2653