unstructured
Feat/1136 elements ordering for pdf
#1161
Merged

Feat/1136 elements ordering for pdf #1161

cragwolfe merged 38 commits into main from feat/1136-elements-ordering-for-pdf
christinestraub
christinestraub chore: add example docs
dfaf0f81
christinestraub Merge branch 'main' into feat/1136-elements-ordering-for-pdf
87eb1f8f
christinestraub feat: add base scripts to evaluate `xy-cut` sorting result - evaluation
99445a1f
christinestraub feat: scale coordinates to fit actual image size - evaluation
75cfaa89
christinestraub feat: pass `PIL.Image` objects instead of image paths - evaluation
c349af98
christinestraub feat: separate annotated images by pdf `strategy` - evaluation
ed40bfce
christinestraub feat: handle an exception if `PageBreak` is `None` for the last page …
d7e559c3
christinestraub refactor: rename the evaluation script - evaluation
56cfc16e
christinestraub refactor: organization - evaluation
af451596
christinestraub feat: ensure that the result of the `xy-cut` ordering is not affected…
c2b45008
christinestraub feat: add functionality to switch sorting modes
eba56937
christinestraub feat: add functionality to switch sorting modes for `hi_res`
3531aeda
christinestraub feat: update the evaluation script - evaluation
88d813f9
christinestraub feat: add jupyter notebook to provide evaluation for `xy-cut` sorting
3b540ea5
christinestraub test: fix lint errors
8080c83b
christinestraub Merge branch 'main' into feat/1136-elements-ordering-for-pdf
dc9ca8f8
christinestraub chore: update changelog & version
e90421be
christinestraub christinestraub requested a review from cragwolfe cragwolfe 2 years ago
christinestraub chore: include a link to the original repo in the docstring of `unstr…
d1f13dba
christinestraub test: fix lint errors
fccf6839
christinestraub Merge branch 'main' into feat/1136-elements-ordering-for-pdf
314a868a
christinestraub refactor: move `document_to_element_list` from `file_utils/filetype.p…
114ae5b6
christinestraub feat: add `sortable` param to `document_to_element_list` to avoid sor…
baf86cd5
christinestraub test: fix lint errors
c9310ebd
christinestraub feat: add functionality to skip sorting on empty elements
5b7ec64d
christinestraub test: update test cases
18fb1979
christinestraub Merge branch 'main' into feat/1136-elements-ordering-for-pdf
8740b0fa
cragwolfe Merge branch 'main' into feat/1136-elements-ordering-for-pdf
e273ebf1
ryannikolaidis Feat/1136 elements ordering for pdf <- Ingest test fixtures update (#…
614be6fd
cragwolfe
cragwolfe Merge branch 'main' into feat/1136-elements-ordering-for-pdf
8a5f19bf
christinestraub feat: optionally import `sort_page_elements()` in `common.py`
eed362dc
christinestraub chore: remove `evaluate_xy_cut_sorting.ipynb` & create a Google Colab…
9b27abca
christinestraub feat: import `sort_page_elements()` only if `cv2` and `numpy` exist
01303b1c
christinestraub
christinestraub chore: update README
61128cd3
christinestraub Merge branch 'main' into feat/1136-elements-ordering-for-pdf
2c770672
christinestraub feat: apply basic sorting by default for fast `strategy` to avoid non…
9da3efd5
christinestraub test: fix lint errors
509429f7
ryannikolaidis Feat/1136 elements ordering for pdf <- Ingest test fixtures update (#…
9d2af02d
christinestraub Merge branch 'main' into feat/1136-elements-ordering-for-pdf
755885bf
cragwolfe
cragwolfe approved these changes on 2023-08-25
cragwolfe cragwolfe merged 483b09b3 into main 2 years ago
cragwolfe cragwolfe deleted the feat/1136-elements-ordering-for-pdf branch 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone