datasets
Support DataLoader with num_workers > 0 in streaming mode
#4375
Merged

Commits
  • make TorchIterableDataset work in parallel
    lhoestq committed 3 years ago
  • start writing some tests
    lhoestq committed 3 years ago
  • Merge branch 'master' into parallel-torch-iterable-dataset
    lhoestq committed 3 years ago
  • fix streaming extension and fsspec issues in subprocesses
    lhoestq committed 3 years ago
  • fix some tests
    lhoestq committed 3 years ago
  • fix more tests
    lhoestq committed 3 years ago
  • Merge branch 'master' into parallel-torch-iterable-dataset
    lhoestq committed 3 years ago
  • fix import
    lhoestq committed 3 years ago
  • fix and add tests
    lhoestq committed 3 years ago
  • fix patch (handle successive patches and builtins)
    lhoestq committed 3 years ago
  • revert unnecessary change to enriched_web_blg
    lhoestq committed 3 years ago
  • style
    lhoestq committed 3 years ago
  • use open locally to fix win permission errors
    lhoestq committed 3 years ago
  • keep file opened in read_csv
    lhoestq committed 3 years ago
  • Merge branch 'master' into parallel-torch-iterable-dataset
    lhoestq committed 3 years ago
  • fix compression for read_csv
    lhoestq committed 3 years ago
  • consistency of read_csv: don't infer compression for file-like objects
    lhoestq committed 3 years ago
  • stringify Path objects
    lhoestq committed 3 years ago
  • comments + raise error if sharding is ambiguous
    lhoestq committed 3 years ago
  • Merge branch 'master' into parallel-torch-iterable-dataset
    lhoestq committed 3 years ago
  • minor
    lhoestq committed 3 years ago
  • Merge branch 'master' into parallel-torch-iterable-dataset
    lhoestq committed 3 years ago
  • Update src/datasets/iterable_dataset.py
    lhoestq committed 3 years ago
  • Merge branch 'master' into parallel-torch-iterable-dataset
    lhoestq committed 3 years ago
Loading