unstructured
48bdf946 - feat: `partition_pdf()` support language specification for PaddleOCR (#3400)

Commit
1 year ago
feat: `partition_pdf()` support language specification for PaddleOCR (#3400) Closes #3159. This PR extends language specification capability to `PaddleOCR` in addition to `TesseractOCR`. Users can now specify OCR languages for both OCR engines when using `partition_pdf()`. ### Testing ``` os.environ["OCR_AGENT"] = "unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle" elements = partition_pdf( filename=<file_path>, strategy=strategy, languages=["chi_sim"], # chinese - simplified infer_table_structure=True, ) ```
Parents
Loading