unstructured
enhancement: add "ocr_only" strategy for PDFs
#553
Merged

enhancement: add "ocr_only" strategy for PDFs #553

MthwRobinson merged 23 commits into main from enhancement/ocr-only-for-pdfs
MthwRobinson
MthwRobinson add tests for validating strategy
f652048f
MthwRobinson refactor into determine_pdf_strategy function
919ace23
MthwRobinson refactor pdf strategies into strategies
30d31d69
MthwRobinson remove commented out code
a1d55e35
MthwRobinson remove unreachable code
de2d7ae9
MthwRobinson add in handling for image types
30f1739f
MthwRobinson a little more refactoring
4c1a9ae7
MthwRobinson import ocr partioning for images
e211c29a
MthwRobinson catch warnings, partition type for valid strategies
8e39cd4f
MthwRobinson fallback to ocr_only from fast
c35e7b85
MthwRobinson fallback logic for hi_res
e77dff96
MthwRobinson test for fallback to ocr only
e849651e
MthwRobinson fallback logic ofr ocr_only
7488e278
MthwRobinson more tests for fallback logic
93af828e
MthwRobinson update doc strings
4485e901
MthwRobinson version and changelog
471dc77c
MthwRobinson linting, linting, linting
34e06726
MthwRobinson update docs to include notes about strategy
07b30514
MthwRobinson Merge branch 'main' into enhancement/ocr-only-for-pdfs
52c5db1c
MthwRobinson MthwRobinson marked this pull request as ready for review 2 years ago
qued
qued approved these changes on 2023-05-08
MthwRobinson fix typos
6c2b9910
MthwRobinson Merge branch 'enhancement/ocr-only-for-pdfs' of github.com:Unstructur…
6c82c513
MthwRobinson change back patched filename
f487924d
MthwRobinson bump version for release
eeb5abbb
MthwRobinson MthwRobinson enabled auto-merge (squash) 2 years ago
MthwRobinson MthwRobinson merged 3d3f3df3 into main 2 years ago
MthwRobinson MthwRobinson deleted the enhancement/ocr-only-for-pdfs branch 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone