unstructured
Refactor: support merging `extracted` layout with `inferred` layout
#2158
Merged

Refactor: support merging `extracted` layout with `inferred` layout #2158

christinestraub
christinestraub feat: add functionality to merge `inferred` with `extracted` when `fi…
897f97ca
christinestraub feat: add functionality to merge `inferred` with `extracted` when `fi…
57eba400
christinestraub Merge branch 'main' into refactor/pdf_text_extraction_for_hi_res
17ec904b
christinestraub feat: sort extracted layout by deterministic ordering
e8880476
christinestraub chore: add force `pip install -e .`
c637ed03
christinestraub chore: update changelog & version
68aacd34
christinestraub fix: lint
8669d8b6
christinestraub chore: update `flake8` config to exclude `unstructured-inference` di…
bb6a16a4
christinestraub feat: reflect added `Source.PDFMINER` constant
f66cda65
christinestraub chore: update ci
0e8e4664
christinestraub refactor: import `order_layout` within function
646a29d7
christinestraub test: fix lint errors
f8f004dd
christinestraub Merge branch 'main' into refactor/pdf_text_extraction_for_hi_res
8cd75a40
christinestraub test: fix unit test errors
d2e2e071
christinestraub refactor: organize files for partitioning pdf/image
4d2d1900
christinestraub refactor: add a new module `pdfminer_processing`
4abefa92
christinestraub feat: update `_merge_inferred_with_extracted()` to get image size fro…
5978279d
christinestraub refactor: `_merge_inferred_with_extracted()`
1a4083af
christinestraub test: update module import
f0be24c9
christinestraub Merge branch 'main' into refactor/pdf_text_extraction_for_hi_res
fbfe8def
christinestraub chore: update version
62b1513d
christinestraub feat: use elements returned by `inference.PageLayout.get_elements_fro…
dff68e63
christinestraub fix: lint errors
149444a9
christinestraub refactor: move code related to `pdfminer` patch from `unstructured-in…
02190409
christinestraub test: fix unit test errors
a42b7e6a
christinestraub refactor: move `_merge_inferred_with_extracted()` to pdfminer_process…
383c4965
christinestraub Merge branch 'main' into refactor/pdf_text_extraction_for_hi_res
d1489509
christinestraub test: fix lint errors
c204cf54
christinestraub feat: import modules depend on `unstructured_inference` library only …
2603374b
christinestraub Merge branch 'main' into refactor/pdf_text_extraction_for_hi_res
62ea5e86
christinestraub Merge branch 'main' into refactor/pdf_text_extraction_for_hi_res
7de19d57
christinestraub refactor: use `init_pdfminer()` in `_open_pdfminer_pages_generator()`
e6f6511a
christinestraub Merge branch 'main' into refactor/pdf_text_extraction_for_hi_res
d8bf20c9
christinestraub chore: update changelog & version
dd889995
christinestraub chore: update ci
1327055b
christinestraub christinestraub marked this pull request as ready for review 2 years ago
christinestraub christinestraub requested a review from cragwolfe cragwolfe 2 years ago
christinestraub christinestraub requested a review from qued qued 2 years ago
christinestraub christinestraub requested a review from yuming-long yuming-long 2 years ago
christinestraub christinestraub requested a review from benjats07 benjats07 2 years ago
christinestraub feat: use the `open_pdfminer_pages_generator()` procedure in the `hi_…
fe29e79e
christinestraub chore: revert all CI yaml changes
4126e873
christinestraub chore: bump unstructured-inference==0.7.17
d801ed9c
christinestraub Merge branch 'main' into refactor/pdf_text_extraction_for_hi_res
d2fa91f1
christinestraub chore: make pip-compile
651221f5
christinestraub fix: dependency path error when running pip-compile
6bf43d7e
christinestraub chore: make pip-compile
f0f07ab5
benjats07
benjats07 approved these changes on 2023-12-01
christinestraub Merge branch 'main' into refactor/pdf_text_extraction_for_hi_res
4fba0b66
christinestraub chore: make pip-compile
bcea80fd
christinestraub chore: update version
f2e5128b
christinestraub christinestraub merged 69d0ee1a into main 2 years ago
christinestraub christinestraub deleted the refactor/pdf_text_extraction_for_hi_res branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone