unstructured
Refactor: support entire page OCR with `ocr_mode` and `ocr_languages`
#1579
Merged

Refactor: support entire page OCR with `ocr_mode` and `ocr_languages` #1579

cragwolfe merged 105 commits into main from yuming/refactor_ocr
yuming-long
yuming-long stage
4468863c
yuming-long stage
df854660
yuming-long need tp update test
0abd264b
yuming-long stage
1385b33f
yuming-long Merge branch 'main' into yuming/refactor_ocr
8e924b69
yuming-long stage
3f0c0dba
yuming-long Merge branch 'main' into yuming/refactor_ocr
9f66d684
yuming-long change to import
327aa5bc
yuming-long stage
35376ab0
yuming-long revert code back to 5.31 inference
468e1e56
yuming-long update mock test
97962c14
yuming-long some todo note
bd6107b2
yuming-long Revert "some todo note"
58c38ace
yuming-long fix test
593f23eb
yuming-long TODO...
9874b630
yuming-long fix all tests
8d8a0d95
yuming-long cance; out the wrong guy
1d0a81bc
yuming-long add paddle ocr func
38c8db3d
yuming-long feel like missing some texts...
fdbe8a95
yuming-long update todo
cac87a6e
yuming-long Merge branch 'main' into yuming/refactor_ocr
aaee4cd7
yuming-long test ingest
db23355f
ryannikolaidis null <- Ingest test fixtures update (#1571)
bf7d427e
yuming-long tidy and add paddle entire page
21d598a2
yuming-long test file and more doc string
2978d919
yuming-long todo note
04f4a813
yuming-long note todo
54bfde2a
yuming-long move test to unst
c58621a4
yuming-long let ci depends on inference branch
0052d923
yuming-long Merge branch 'main' into yuming/refactor_ocr
58a2ab45
yuming-long changelog versoin
f9ec23e9
yuming-long lint check
afaa5f30
yuming-long no source
19a9b702
ryannikolaidis Yuming/refactor ocr <- Ingest test fixtures update (#1582)
ee6859a3
yuming-long update test ficture ci
56374de9
yuming-long update copyied code
652c3f40
yuming-long Merge branch 'main' into yuming/refactor_ocr
ff628cef
yuming-long update ci
6ea82c2a
yuming-long aviod conflict
8cab7b21
yuming-long Revert "aviod conflict"
46633611
yuming-long Merge branch 'main' into yuming/refactor_ocr
f3c0df8f
yuming-long depilicate name
539f4c5d
yuming-long new line?
fb1eaf11
ryannikolaidis Yuming/refactor ocr <- Ingest test fixtures update (#1617)
593b9c50
yuming-long add individual blockers to ocr mode
cf7901a5
yuming-long moe mote
f6684f26
yuming-long fix bug for tests
e92b7142
yuming-long nit on mock ocr func name
abb8f675
yuming-long should fix all TODO with no ticket number
79151282
yuming-long add dostring
22ad3b63
yuming-long assume to use image from pade.image
0539dd14
yuming-long bug fix
cea28da0
yuming-long Revert "assume to use image from pade.image"
e3a6577b
yuming-long add ocr text
0426811c
yuming-long from file test
7204ec6b
yuming-long more test coverage
cd9473ed
yuming-long Merge branch 'main' into yuming/refactor_ocr
22c1f6d8
yuming-long
yuming-long commented on 2023-10-03
yuming-long rewite try except
43093e70
yuming-long revert some fixed changes; only import paddle in func
398c96a4
yuming-long yuming-long changed the title Yuming/refactor ocr Chore: support entire page OCR with `ocr_mode` and `ocr_languages` 2 years ago
yuming-long yuming-long marked this pull request as ready for review 2 years ago
yuming-long yuming-long requested a review from christinestraub christinestraub 2 years ago
yuming-long yuming-long requested a review from benjats07 benjats07 2 years ago
yuming-long yuming-long requested a review from qued qued 2 years ago
benjats07
benjats07 approved these changes on 2023-10-04
yuming-long add pip install -e . right before ingest update
7691a114
yuming-long updaye for ci test
0c5b0a41
yuming-long revert all ci yaml changes
b9ea113a
ryannikolaidis Chore: support entire page OCR with `ocr_mode` and `ocr_languages` <-…
0725bea3
qued
qued dismissed these changes on 2023-10-04
yuming-long install branch right before test
ef44c8c9
yuming-long Merge branch 'main' into yuming/refactor_ocr
404fb713
ryannikolaidis Chore: support entire page OCR with `ocr_mode` and `ocr_languages` <-…
d3b5a8f4
christinestraub christinestraub changed the title Chore: support entire page OCR with `ocr_mode` and `ocr_languages` Refactor: support entire page OCR with `ocr_mode` and `ocr_languages` 2 years ago
christinestraub refactor: add `OCRMode` enum
cd82e31c
yuming-long move tesseract env; move constant
5cdf3274
christinestraub
christinestraub dismissed these changes on 2023-10-04
yuming-long Merge branch 'main' into yuming/refactor_ocr
db2e48b7
yuming-long add padding logic to individual blocks
f61ee9ac
yuming-long yuming-long requested a review from christinestraub christinestraub 2 years ago
christinestraub Merge branch 'main' into yuming/refactor_ocr
904d85e4
christinestraub refactor: keep original element when adding padding
21e93c1e
christinestraub test: add test cases for `pad_element_bboxes()`
463d85f2
christinestraub refactor: remove unused index
68e41f03
christinestraub refactor: fix spelling mistakes
819047aa
christinestraub Merge branch 'main' into yuming/refactor_ocr
3293f9fe
yuming-long fix test: add index to title since xy cut
9c8ea7e9
yuming-long fix test: update title output since ocr change it
6c12c246
yuming-long lint
d4219491
christinestraub feat: update logic to merge "out layout" (returned by `unstructured-i…
6ac3505c
yuming-long fix test and doc nit inferred_layout -> out_layout
223038e5
yuming-long Merge branch 'main' into yuming/refactor_ocr
aa17d8e6
yuming-long Merge branch 'main' into yuming/refactor_ocr
dfeba46c
christinestraub refactor: keep passing parameters used to extract images from PDF's t…
2260b997
yuming-long update ocr output in test
428ba60d
yuming-long revert force pip install -e .
ae974498
yuming-long pip unstructured-inference==0.7.0 and dep conlicts
73f34535
yuming-long Merge branch 'main' into yuming/refactor_ocr
b6881e83
yuming-long version bump
73ef72f6
yuming-long add test coverage
88fbf5c1
yuming-long Merge branch 'main' into yuming/refactor_ocr
a93644db
yuming-long add coverage: skip converage check on paddle init
92dc988b
yuming-long Merge branch 'main' into yuming/refactor_ocr
a63b07e0
ryannikolaidis Refactor: support entire page OCR with `ocr_mode` and `ocr_languages`…
ea323e5e
christinestraub Merge branch 'main' into yuming/refactor_ocr
4e349aef
christinestraub fix: element with `text=None` in final_layout
25b7ea5f
christinestraub Merge branch 'main' into yuming/refactor_ocr
d19a55f1
ryannikolaidis Refactor: support entire page OCR with `ocr_mode` and `ocr_languages`…
a3112598
christinestraub chore: update ingest test fixtures
856d3fff
christinestraub chore: revert ingest test fixtures
3bd6256e
christinestraub chore: bump unstructured-inference==0.7.2 & make pip-compile
cc361499
christinestraub Merge branch 'main' into yuming/refactor_ocr
b29f8bc0
christinestraub chore: update version
e5b69253
ryannikolaidis Refactor: support entire page OCR with `ocr_mode` and `ocr_languages`…
a1484864
christinestraub chore: update dependencies
3957fa68
cragwolfe
cragwolfe approved these changes on 2023-10-06
christinestraub
christinestraub approved these changes on 2023-10-06
cragwolfe cragwolfe dismissed their stale review 2 years ago
stale request
cragwolfe cragwolfe dismissed their stale review 2 years ago
will create follow on github issues for minor ingest diff's. overall an improvement.
cragwolfe cragwolfe merged dcd6d0ff into main 2 years ago
cragwolfe cragwolfe deleted the yuming/refactor_ocr branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone