refactor: unstructured ingest as a pipeline #1551
rbiseck3
force pushed
from
b780f476
to
37813cee
2 years ago
rbiseck3
marked this pull request as ready for review 2 years ago
rbiseck3
force pushed
from
37813cee
to
2df5fd30
2 years ago
rbiseck3
force pushed
from
d8592aee
to
73619b40
2 years ago
rbiseck3
force pushed
from
b513dc97
to
1a998b68
2 years ago
rbiseck3
force pushed
from
f39a19c2
to
6fa71e19
2 years ago
rbiseck3
force pushed
from
5923c932
to
00090215
2 years ago
rbiseck3
force pushed
from
247958dd
to
2474bd27
2 years ago
WIP: refactoring to support pipeline
758b43c9
WIP: added properties to serialization of ingest docs
12d59c6a
refactor pipeline approach
a07e4b13
complete all steps of pipeline
9ae5d3f0
Add step to copy to final destination
e60a0054
fix how hashing occurs to allow reproducability
e8c1086e
Update sharepoint and s3 to use pipeline
a29f5fca
Update airtable to use pipeline
b63f6846
Update azure to use pipeline
bb86fef1
Update biomed to use pipeline
8ca2d199
Update box to use pipeline
018de959
Update confluence to use pipeline
9fe0a061
Update delta table to use pipeline
3ae7dfb5
Update discord to use pipeline
0e842802
Update dropbox to use pipeline
3e8169d0
Update elasticsearch to use pipeline
e5b8476d
Update fsspec to use pipeline
1b38a44c
Update gcs to use pipeline
fa5a979f
Update github to use pipeline
7f1be8d4
Update gitlab to use pipeline
f50008ec
Update google drive to use pipeline
09a9c3dd
Update jira to use pipeline
cd154c44
Update local to use pipeline
30e5de99
Update notion to use pipeline
67baf047
Update onedrive to use pipeline
330ba2da
Update outlook to use pipeline
1c2203ed
Update reddit to use pipeline
aeb5635e
Update salesforce to use pipeline
8f151a79
Update slack to use pipeline
64ee676c
Update wikipedia to use pipeline
fa1412bd
lint fixes
29e56b4b
Drop unit test of file that was removed
40572ab3
Update Changelog
20c86db5
Update tests to use explicit work dir
24b9fa23
Fix unit tests
ccdc1e22
Fix unit tests
21472aab
run pip-compile
a47ca6a4
run pip-compile
0f03e6c7
pin version of onnxruntime
2f37a806
Fix partition to run process_file
f9b684dd
fix getting source metadata in fsspec connector
b3eec1d9
Fix unit tests
b2b36da1
fix delta table dest connector
59dc01aa
fix delta s3 dest connector
62bd99ad
fix getting source metadata in salesforce connector
f7d87965
improve getting record info in salesforce connector
d902a924
Testing ingest tests fix
9ae6d7a5
revert deps
448cf351
fix PR comments
4bb636b0
rbiseck3
force pushed
from
c9f9c312
to
4bb636b0
2 years ago
fix getting source metadata lazily
53d4453c
fix linting
097cb243
Rename all runner clases to include Runner in name
078d35a9
ryannikolaidis
changed the title roman/refactor ingest as pipeline refactor: unstructured ingest as a pipeline 2 years ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub