datasets
Support DataLoader with num_workers > 0 in streaming mode
#4375
Merged

Support DataLoader with num_workers > 0 in streaming mode #4375

lhoestq merged 24 commits into master from parallel-torch-iterable-dataset
lhoestq
lhoestq make TorchIterableDataset work in parallel
58289fc3
lhoestq start writing some tests
c26a9f11
lhoestq Merge branch 'master' into parallel-torch-iterable-dataset
4c3ce960
lhoestq fix streaming extension and fsspec issues in subprocesses
8c60fa30
HuggingFaceDocBuilderDev
lhoestq fix some tests
6dc859e5
lhoestq fix more tests
7056f1a0
lhoestq Merge branch 'master' into parallel-torch-iterable-dataset
edef69b1
lhoestq fix import
c0a0492e
lhoestq fix and add tests
7043816a
lhoestq fix patch (handle successive patches and builtins)
a9ea9559
lhoestq revert unnecessary change to enriched_web_blg
07d4c0e4
lhoestq style
af5de1ac
lhoestq use open locally to fix win permission errors
b84ae0ea
lhoestq keep file opened in read_csv
17467121
lhoestq Merge branch 'master' into parallel-torch-iterable-dataset
bc837ce0
lhoestq fix compression for read_csv
fe269bff
lhoestq consistency of read_csv: don't infer compression for file-like objects
482c4fbe
lhoestq stringify Path objects
54e9f39c
lhoestq lhoestq marked this pull request as ready for review 3 years ago
lhoestq lhoestq requested a review from mariosasko mariosasko 3 years ago
lhoestq
lhoestq
lhoestq commented on 2022-06-07
lhoestq
lhoestq commented on 2022-06-07
lhoestq
lhoestq commented on 2022-06-07
lhoestq
lhoestq commented on 2022-06-07
mariosasko
mariosasko commented on 2022-06-08
lhoestq comments + raise error if sharding is ambiguous
8f5579eb
lhoestq Merge branch 'master' into parallel-torch-iterable-dataset
ab91dbdf
lhoestq minor
1b87fb3b
lhoestq
mariosasko
lhoestq Merge branch 'master' into parallel-torch-iterable-dataset
b675a694
lhoestq
mariosasko
mariosasko approved these changes on 2022-06-10
lhoestq Update src/datasets/iterable_dataset.py
816d5912
lhoestq Merge branch 'master' into parallel-torch-iterable-dataset
ff586c46
lhoestq
lhoestq lhoestq merged ab7d3045 into master 3 years ago
lhoestq lhoestq deleted the parallel-torch-iterable-dataset branch 3 years ago
justheuristic

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone