unstructured
feat: refactor ingest
#3009
Merged

feat: refactor ingest #3009

rbiseck3 merged 53 commits into main from roman/refactor-ingest
rbiseck3
rbiseck3 rbiseck3 requested a review from ryannikolaidis ryannikolaidis 1 year ago
rbiseck3 rbiseck3 requested a review from potter-potter potter-potter 1 year ago
potter-potter
potter-potter commented on 2024-05-15
potter-potter
rbiseck3 rbiseck3 force pushed from 2c41d9b8 to 002824fc 1 year ago
rbiseck3 rbiseck3 force pushed from 002824fc to cc14e480 1 year ago
rbiseck3 rbiseck3 changed the title feat: refactor ingest (WIP) feat: refactor ingest 1 year ago
ryannikolaidis
ryannikolaidis
ryannikolaidis commented on 2024-05-16
ryannikolaidis
ryannikolaidis commented on 2024-05-16
rbiseck3 rbiseck3 force pushed from 4cfd6a14 to aed69c3c 1 year ago
ryannikolaidis
ryannikolaidis
ryannikolaidis commented on 2024-05-17
rbiseck3 rbiseck3 force pushed from d040a369 to 58cdb613 1 year ago
rbiseck3 rbiseck3 force pushed from 58cdb613 to 253f1582 1 year ago
ryannikolaidis
ryannikolaidis commented on 2024-05-17
ryannikolaidis
ryannikolaidis commented on 2024-05-17
ryannikolaidis
ryannikolaidis commented on 2024-05-17
ryannikolaidis
ryannikolaidis commented on 2024-05-17
rbiseck3 rbiseck3 force pushed from 0f216ed1 to 90d8d30c 1 year ago
rbiseck3 rbiseck3 force pushed from 701bd89c to 9c35a593 1 year ago
rbiseck3 Create new interfaces to support more versatility in how ingest proce…
f48e6083
rbiseck3 Begin flushing out pipeline
4945d099
rbiseck3 Add partitioner pipelien step
20fd7d1a
rbiseck3 Add chunker pipeline step
f8c18f3e
rbiseck3 Add upload pipeline step
7a6b8e44
rbiseck3 Support file level reprocess flag
7bc79dfe
rbiseck3 Add local destination as default
4d3a5c65
rbiseck3 Add support for uncompress via new pipeline step
3898b220
rbiseck3 Move files around
0f2822af
rbiseck3 Add s3 connector
c3e71132
rbiseck3 Add cli commands
db3fe7e5
rbiseck3 bring over more logic from original implementation
97c14b7e
rbiseck3 dynamically add new commands into existing list, annotated with v2 as…
59cce075
rbiseck3 fix fsspec inputs
0b1e72a6
rbiseck3 print all errors at the end of pipeline
934acb89
rbiseck3 Add optional limit on connections when using asyncio
88e75875
rbiseck3 Add entry to changelog
b0201ab0
rbiseck3 support python3.9
33bf0408
rbiseck3 improve type checking in fsspec connectors
b6a44344
rbiseck3 Add __future__ to top level __init__ for v2 code
e7739a94
rbiseck3 Add better type checking in cli command code
e7203a26
rbiseck3 update fsspec metadata to include record locator info
7bb91d92
rbiseck3 Fix endpoint param in s3 fsspec connector
d823510b
rbiseck3 Small optimization in getting acccess configs from s3 connector config
0dd164df
rbiseck3 Add recursive flag to local cli inputs
5f128de7
rbiseck3 Add checks when getting values from os.stat
dd2706de
rbiseck3 Add a classmethod to generate pipeline from configs
0ad80ca2
rbiseck3 Add dependency check wrapper for s3 connector
6951a3a2
rbiseck3 Add new README in v2
95ea1cd8
rbiseck3 Fix local connector
044c7589
rbiseck3 Fix await in s3 connector
2489290f
ryannikolaidis feat: refactor ingest <- Ingest test fixtures update (#3048)
79856018
rbiseck3 Improve typing
84a6ee07
rbiseck3 expose max connections in CLI
c47a8a1c
rbiseck3 Add sequence diagram
91039e12
rbiseck3 remove print statement
acd32202
rbiseck3 Don't pass unset partition kwargs
2ab79942
rbiseck3 skip confluence
bd1315ea
ryannikolaidis feat: refactor ingest <- Ingest test fixtures update (#3059)
41361f49
rbiseck3 Add back in confluence tests
5c7cfbba
rbiseck3 fix s3 uploader
467a8878
rbiseck3 fix s3 uploader
11312bae
rbiseck3 Skip date created for minio as this will never be consistent
0d8b5b9d
rbiseck3 tidy shell
811f3bf1
rbiseck3 skip confluence
fef31348
ryannikolaidis feat: refactor ingest <- Ingest test fixtures update (#3060)
ff9ef036
rbiseck3 Add back in confluence tests
d1ce6949
rbiseck3 fix minio test
57ff33de
rbiseck3 Update use of chunking strategy in CLI inputs
ff52ecef
rbiseck3 fix chunk strategy cli param
eb092100
ryannikolaidis feat: refactor ingest <- Ingest test fixtures update (#3064)
aa537116
rbiseck3 rbiseck3 force pushed from 51038072 to aa537116 1 year ago
ryannikolaidis
ryannikolaidis approved these changes on 2024-05-21
rbiseck3 Add back in elasticsearch_elements_mappings.json into the es scripts dir
57b6f2b2
rbiseck3 Add back in elasticsearch_elements_mappings.json into the opensearch …
58e1f8ff
rbiseck3 rbiseck3 enabled auto-merge 1 year ago
rbiseck3 rbiseck3 merged 3eaf65a8 into main 1 year ago
rbiseck3 rbiseck3 deleted the roman/refactor-ingest branch 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone