unstructured
refactor: `partition_pdf()` for `ocr_only` strategy
#1811
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
78
Changes
View On
GitHub
refactor: `partition_pdf()` for `ocr_only` strategy
#1811
christinestraub
merged 78 commits into
main
from
refactor/partition-pdf-ocr_only
feat: update `ocr_only` strategy related code using `process_file_wit…
4cdc559f
feat: add functionality to get layout elements from ocr regions (`ocr…
d9c5f68f
feat: update `merge_out_layout_with_ocr_layout()` to perform grouping…
9eb10480
refactor: renaming...
3e111fbb
refactor: organization
b59ff5f6
refactor: revert renaming
8bf8b4ba
refactor: combine `get_ocr_layout_from_image()` and `get_ocr_text_fro…
c1a4d571
feat:
e49ab0e6
Merge branch 'main' into refactor/partition-pdf-ocr_only
2e2c4a19
feat: add an `Enum` for OCR sources
babd2ba1
feat: add functionality to get layout elements from ocr regions for `…
d3cbbe06
feat: add functionality to get `source` when merging text regions
49ef5553
Merge branch 'main' into refactor/partition-pdf-ocr_only
a5af33be
refactor: minor changes
2b70b080
refactor: separate `ocr_only` path from `hi_res` path
bee455f8
Merge branch 'main' into refactor/partition-pdf-ocr_only
51bcd4e9
refactor: update `get_ocr_data_from_image` to reflect changes in the …
30e7661a
test: fix lint errors
9e97cdc9
refactor: rename `entire_page_ocr` to `ocr_agent`
ed7c5998
chore: update required dependencies for `_partition_pdf_or_image_with…
d566de7b
feat: add constants for OCR agents
386b8a77
refactor: update ocr test cases
e026a960
chore: update changelog & version
7fa71cc3
christinestraub
marked this pull request as ready for review
2 years ago
Merge branch 'main' into refactor/partition-pdf-ocr_only
38dc3a41
christinestraub
requested a review
from
yuming-long
2 years ago
christinestraub
requested a review
from
qued
2 years ago
christinestraub
requested a review
from
cragwolfe
2 years ago
chore: update changelog
beb1980a
yuming-long
commented on 2023-10-24
yuming-long
commented on 2023-10-24
yuming-long
commented on 2023-10-24
test: fix unit test errors
8ec73c59
feat: set `languages` metadata field
3cf72eb9
Merge branch 'main' into refactor/partition-pdf-ocr_only
3988b3cd
chore: add docstring to `get_ocr_data_from_image()`
e327db72
feat: revert setting `languages` metadata field for `hi_res` strategy
34fe9641
Merge branch 'main' into refactor/partition-pdf-ocr_only
d21b28ea
chore: fix lint errors
4c7ef536
christinestraub
requested a review
from
yuming-long
2 years ago
chore: add notes to `get_page_layout_from_ocr()`
3b57a384
test: fix unit test errors
546e6186
Merge branch 'main' into refactor/partition-pdf-ocr_only
8e70433d
chore: update version
71468700
Merge branch 'main' into refactor/partition-pdf-ocr_only
1f6ec4b1
refactor: `partition_pdf()` for `ocr_only` strategy <- Ingest test fi…
d258bd54
yuming-long
commented on 2023-10-25
qued
commented on 2023-10-25
qued
commented on 2023-10-25
qued
commented on 2023-10-25
qued
dismissed these changes on 2023-10-25
feat: udpate page layout element type by `element_from_text()`
85f8d42a
feat: disable sorting for `tesseract`
7d8edaaa
feat: update natural reading order evaluation script to skip drawing …
fcd55eca
refactor: reduce dependency on `unstructured-inference` format
82eca62e
Merge branch 'main' into refactor/partition-pdf-ocr_only
d8c51181
chore: update version
a306a130
chore: update changelog & version
ebb7adfe
feat: move `OCR_AGENT` to environment config & add utility function `…
d519ead6
test: update test case
22af55b6
refactor: `partition_pdf()` for `ocr_only` strategy <- Ingest test fi…
1446d27a
Merge branch 'main' into refactor/partition-pdf-ocr_only
edbe55c9
chore: update version
22af35e2
refactor: `partition_pdf()` for `ocr_only` strategy <- Ingest test fi…
8ab28481
christinestraub
requested a review
from
yuming-long
2 years ago
yuming-long
approved these changes on 2023-10-26
Merge branch 'main' into refactor/partition-pdf-ocr_only
3e397069
chore: update changelog & version
03f4bb2a
refactor: revert combining `get_ocr_layout_from_image()` and `get_ocr…
ad30d5fd
feat: remove bad `detection_origin`
d49d935c
test: add test cases
9c5ee2fc
christinestraub
requested a review
from
qued
2 years ago
refactor: renaming...
fa294e85
feat: update `_ocr_data_to_elements` to return "UncategorizedText" el…
c0259267
refactor: merge test functions
99234342
test: update test cases
93b67f55
Merge branch 'main' into refactor/partition-pdf-ocr_only
023588a3
yuming-long
commented on 2023-10-27
yuming-long
commented on 2023-10-27
test: add test cases for xycut.py
90e9f732
refactor: remove unused functions
719c0a59
Merge branch 'main' into refactor/partition-pdf-ocr_only
bc340034
chore: update version
80bfe471
test: fix lint errors
47bce397
Merge branch 'main' into refactor/partition-pdf-ocr_only
13f4967d
chore: update changelog & version
36477ec3
chore: update log messages
a476c7b9
refactor: renaming...
be5c79ec
christinestraub
enabled auto-merge
2 years ago
disabled auto-merge
2 years ago
Manually disabled by user
Merge branch 'main' into refactor/partition-pdf-ocr_only
f385bb87
Merge branch 'main' into refactor/partition-pdf-ocr_only
4b75f41c
refactor: merge multiple test functions related to "ocr_only" strateg…
026028ae
test: update test functions
fc0e9ab1
chore: update changelog & version
16b3b125
test: update test function
4beca658
chore: fix version
6f4fc014
test: fix lint errors
a5f3ba3c
cragwolfe
approved these changes on 2023-10-30
christinestraub
dismissed their stale review
2 years ago
already addressed
christinestraub
merged
1f0c563e
into main
2 years ago
christinestraub
deleted the refactor/partition-pdf-ocr_only branch
2 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
cragwolfe
yuming-long
qued
Assignees
No one assigned
Labels
None yet
Milestone
No milestone