unstructured
65344117 - enhancement: entire page OCR output included with hi_res (#1263)

Commit
2 years ago
enhancement: entire page OCR output included with hi_res (#1263) Bumps unstructured-inference==0.5.19 to bring in @christinestraub's enhancement https://github.com/Unstructured-IO/unstructured-inference/pull/186 . This is a **massive** improvement where previously omitted text was not included in `hi_res` output if the layout model had not put a bounding box around it. In addition, the xycut sorting algorithm generally does a good job of ordering the merged OCR-text-not-in-layout-model bboxes with layout-model bboxes into "natural reading order." More details in https://github.com/Unstructured-IO/unstructured-inference/pull/186#issuecomment-1700438645 . Bonus: changelog fix.
Author
Parents
Loading