Refactor: support entire page OCR with `ocr_mode` and `ocr_languages` #1579
stage
4468863c
stage
df854660
need tp update test
0abd264b
stage
1385b33f
Merge branch 'main' into yuming/refactor_ocr
8e924b69
stage
3f0c0dba
Merge branch 'main' into yuming/refactor_ocr
9f66d684
change to import
327aa5bc
stage
35376ab0
revert code back to 5.31 inference
468e1e56
update mock test
97962c14
some todo note
bd6107b2
Revert "some todo note"
58c38ace
fix test
593f23eb
TODO...
9874b630
fix all tests
8d8a0d95
cance; out the wrong guy
1d0a81bc
add paddle ocr func
38c8db3d
feel like missing some texts...
fdbe8a95
update todo
cac87a6e
Merge branch 'main' into yuming/refactor_ocr
aaee4cd7
test ingest
db23355f
null <- Ingest test fixtures update (#1571)
bf7d427e
tidy and add paddle entire page
21d598a2
test file and more doc string
2978d919
todo note
04f4a813
note todo
54bfde2a
move test to unst
c58621a4
let ci depends on inference branch
0052d923
Merge branch 'main' into yuming/refactor_ocr
58a2ab45
changelog versoin
f9ec23e9
lint check
afaa5f30
no source
19a9b702
Yuming/refactor ocr <- Ingest test fixtures update (#1582)
ee6859a3
update test ficture ci
56374de9
update copyied code
652c3f40
Merge branch 'main' into yuming/refactor_ocr
ff628cef
update ci
6ea82c2a
aviod conflict
8cab7b21
Revert "aviod conflict"
46633611
Merge branch 'main' into yuming/refactor_ocr
f3c0df8f
depilicate name
539f4c5d
new line?
fb1eaf11
Yuming/refactor ocr <- Ingest test fixtures update (#1617)
593b9c50
add individual blockers to ocr mode
cf7901a5
moe mote
f6684f26
fix bug for tests
e92b7142
nit on mock ocr func name
abb8f675
should fix all TODO with no ticket number
79151282
add dostring
22ad3b63
assume to use image from pade.image
0539dd14
bug fix
cea28da0
Revert "assume to use image from pade.image"
e3a6577b
add ocr text
0426811c
from file test
7204ec6b
more test coverage
cd9473ed
Merge branch 'main' into yuming/refactor_ocr
22c1f6d8
rewite try except
43093e70
revert some fixed changes; only import paddle in func
398c96a4
yuming-long
changed the title Yuming/refactor ocr Chore: support entire page OCR with `ocr_mode` and `ocr_languages` 2 years ago
yuming-long
marked this pull request as ready for review 2 years ago
benjats07
approved these changes
on 2023-10-04
add pip install -e . right before ingest update
7691a114
updaye for ci test
0c5b0a41
revert all ci yaml changes
b9ea113a
Chore: support entire page OCR with `ocr_mode` and `ocr_languages` <-…
0725bea3
qued
dismissed these changes
on 2023-10-04
install branch right before test
ef44c8c9
Merge branch 'main' into yuming/refactor_ocr
404fb713
Chore: support entire page OCR with `ocr_mode` and `ocr_languages` <-…
d3b5a8f4
christinestraub
changed the title Chore: support entire page OCR with `ocr_mode` and `ocr_languages` Refactor: support entire page OCR with `ocr_mode` and `ocr_languages` 2 years ago
refactor: add `OCRMode` enum
cd82e31c
move tesseract env; move constant
5cdf3274
Merge branch 'main' into yuming/refactor_ocr
db2e48b7
add padding logic to individual blocks
f61ee9ac
Merge branch 'main' into yuming/refactor_ocr
904d85e4
refactor: keep original element when adding padding
21e93c1e
test: add test cases for `pad_element_bboxes()`
463d85f2
refactor: remove unused index
68e41f03
refactor: fix spelling mistakes
819047aa
Merge branch 'main' into yuming/refactor_ocr
3293f9fe
fix test: add index to title since xy cut
9c8ea7e9
fix test: update title output since ocr change it
6c12c246
lint
d4219491
feat: update logic to merge "out layout" (returned by `unstructured-i…
6ac3505c
fix test and doc nit inferred_layout -> out_layout
223038e5
Merge branch 'main' into yuming/refactor_ocr
aa17d8e6
Merge branch 'main' into yuming/refactor_ocr
dfeba46c
refactor: keep passing parameters used to extract images from PDF's t…
2260b997
update ocr output in test
428ba60d
revert force pip install -e .
ae974498
pip unstructured-inference==0.7.0 and dep conlicts
73f34535
Merge branch 'main' into yuming/refactor_ocr
b6881e83
version bump
73ef72f6
add test coverage
88fbf5c1
Merge branch 'main' into yuming/refactor_ocr
a93644db
add coverage: skip converage check on paddle init
92dc988b
Merge branch 'main' into yuming/refactor_ocr
a63b07e0
Refactor: support entire page OCR with `ocr_mode` and `ocr_languages`…
ea323e5e
Merge branch 'main' into yuming/refactor_ocr
4e349aef
fix: element with `text=None` in final_layout
25b7ea5f
Merge branch 'main' into yuming/refactor_ocr
d19a55f1
Refactor: support entire page OCR with `ocr_mode` and `ocr_languages`…
a3112598
chore: update ingest test fixtures
856d3fff
chore: revert ingest test fixtures
3bd6256e
chore: bump unstructured-inference==0.7.2 & make pip-compile
cc361499
Merge branch 'main' into yuming/refactor_ocr
b29f8bc0
chore: update version
e5b69253
Refactor: support entire page OCR with `ocr_mode` and `ocr_languages`…
a1484864
chore: update dependencies
3957fa68
cragwolfe
approved these changes
on 2023-10-06
cragwolfe
dismissed their stale review
2 years ago
cragwolfe
dismissed their stale review
2 years ago
cragwolfe
merged
dcd6d0ff
into main 2 years ago
cragwolfe
deleted the yuming/refactor_ocr branch 2 years ago
Assignees
No one assigned