Remote JSONL IFT data (#275)
* support remote jsonl files for IFT datasets
* improve docstring
* add support for other extensions
* don't duplicate validation check
* build dataset before tmpdir deletes
* parse uri
* only rank 0 download
* only download rank 0
* better error
* break earlier
* log more
* more reasonable destination str
* use data files format
* name points to a preprocessing function I guess
* debugging
* always something with HF
* json vs jsonl [no-ci]
* if hf wants it local, make it local [no-ci]
* back to tempfile [no-ci]
* debug
* debug hfds [no-ci]
* ... [no-ci]
* don't rename file
* use tempfile again
* updt
---------
Co-authored-by: Vitaliy Chiley <6439018+vchiley@users.noreply.github.com>
Co-authored-by: root <vitaliy@mosaicml.com>