llm-foundry
af209b38 - Remote JSONL IFT data (#275)

Commit
2 years ago
Remote JSONL IFT data (#275) * support remote jsonl files for IFT datasets * improve docstring * add support for other extensions * don't duplicate validation check * build dataset before tmpdir deletes * parse uri * only rank 0 download * only download rank 0 * better error * break earlier * log more * more reasonable destination str * use data files format * name points to a preprocessing function I guess * debugging * always something with HF * json vs jsonl [no-ci] * if hf wants it local, make it local [no-ci] * back to tempfile [no-ci] * debug * debug hfds [no-ci] * ... [no-ci] * don't rename file * use tempfile again * updt --------- Co-authored-by: Vitaliy Chiley <6439018+vchiley@users.noreply.github.com> Co-authored-by: root <vitaliy@mosaicml.com>
Author
Parents
Loading