unstructured
feat: `partition_pdf()` support language specification for PaddleOCR
#3400
Merged

feat: `partition_pdf()` support language specification for PaddleOCR #3400

christinestraub
christinestraub refactor: pass through ocr_languages
5b53799c
christinestraub feat: convert TesseractOCR language code to PaddleOCR language code
2c31ec8c
christinestraub feat: add support for specifying OCR languages when instantiating an …
29b4cbe3
christinestraub refactor: remove `ocr_languages` param
e0e1227d
christinestraub refactor: update `image_or_pdf_to_dataframe` to handle changes in `ge…
3d50b31e
christinestraub Merge branch 'refs/heads/main' into feat/pdf-support-paddleocr-langua…
725e6253
christinestraub test: update unit test
a54166f0
christinestraub chore: update changelog & version
66b56a93
christinestraub christinestraub requested a review from MthwRobinson MthwRobinson 1 year ago
christinestraub christinestraub requested a review from scanny scanny 1 year ago
christinestraub christinestraub marked this pull request as ready for review 1 year ago
christinestraub feat: handle invalid language code when converting Tesseract language…
73e9346a
scanny
scanny approved these changes on 2024-07-16
christinestraub test: add unit test for `tesseract_to_paddle_language()`
106518be
christinestraub Merge branch 'refs/heads/main' into feat/pdf-support-paddleocr-langua…
2c9d9ea6
christinestraub chore: update log
df6ace61
christinestraub feat: remove constant `DEFAULT_PADDLE_LANG`
dd66e749
christinestraub refactor: update default language setting
02dcec3b
christinestraub christinestraub merged 48bdf946 into main 1 year ago
christinestraub christinestraub deleted the feat/pdf-support-paddleocr-language-specification branch 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone