pytorch
1b578c4b - [DataLoader] Close byte stream explicitly (#58938)

Commit
3 years ago
[DataLoader] Close byte stream explicitly (#58938) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58938 When run `test_datapipe.py`, python `gc` would report lots of `ResourceWarning`s due to unclosed stream. It's not only annoying, there are two potential problems: - Performance regression because `gc` requires additional memory and computation to track reference - Python `gc` runs periodically so we many encountered an error of too many open files due to OS limitation To reduce the warning: - Explicitly close byte stream - Modify `test_datapipe.py` to use context manager Small fix: - Reorder import in `test_datapipe.py` Further investigation: Can we directly use context manager in `LoadFileFromDisk` and `ReadFileFromTar` to eliminate this Error? - Probably no. It's feasible only if the pipeline is synchronized and without prefetching. When we enable these two features, the scope guard of the context manager doesn't work. - We may need to implement some reference counter attached to these file byte stream to close by itself. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D28689862 Pulled By: ejguan fbshipit-source-id: bb2a85defb8a4ab5384db902ef6ad062185c2653
Author
Parents
Loading