llm-foundry
Remote JSONL IFT data
#275
Merged

Remote JSONL IFT data #275

vchiley merged 36 commits into main from remote-jsonl-ift
samhavens
samhavens support remote jsonl files for IFT datasets
3f318634
samhavens improve docstring
be630c03
samhavens samhavens requested a review from alextrott16 alextrott16 2 years ago
samhavens add support for other extensions
eded4dfa
dakinggg
dakinggg commented on 2023-06-02
alextrott16
alextrott16 commented on 2023-06-02
samhavens don't duplicate validation check
e82ed189
dakinggg
dakinggg commented on 2023-06-02
alextrott16
alextrott16 commented on 2023-06-02
samhavens build dataset before tmpdir deletes
4a93eaaa
samhavens parse uri
92a14803
samhavens samhavens marked this pull request as ready for review 2 years ago
samhavens
dakinggg
dakinggg commented on 2023-06-04
samhavens only rank 0 download
f6380ecf
samhavens only download rank 0
8173998a
dakinggg
dakinggg approved these changes on 2023-06-06
alextrott16
alextrott16 approved these changes on 2023-06-06
samhavens better error
0f31e335
samhavens break earlier
c3676ff0
samhavens log more
67fc6151
samhavens more reasonable destination str
171aadb6
samhavens use data files format
4d420677
samhavens name points to a preprocessing function I guess
455c88a7
samhavens debugging
85dcf9bb
samhavens always something with HF
edabf535
samhavens json vs jsonl [no-ci]
93c50d1c
samhavens if hf wants it local, make it local [no-ci]
8b95fae7
samhavens back to tempfile [no-ci]
11204e1e
samhavens debug
3b9b85aa
samhavens debug hfds [no-ci]
b5158ebe
samhavens ... [no-ci]
a6be0623
samhavens don't rename file
ee9f402a
samhavens use tempfile again
246f6ff7
abhi-mosaic
vchiley Merge branch 'main' into remote-jsonl-ift
1211eae3
vchiley Merge branch 'main' into remote-jsonl-ift
bcf5ed62
updt
854e56bb
merge main and cleanup
f39209d1
updt
1f2f9fee
updt
89ee08e7
updt
d7a2f95b
updt
f8a44d97
updt
5aeac479
updt
14f52b51
updt
0139a458
lint
b777d383
vchiley
alextrott16
alextrott16 approved these changes on 2023-06-22
vchiley vchiley merged af209b38 into main 2 years ago
samhavens samhavens deleted the remote-jsonl-ift branch 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone