unstructured
237d04c8 - feat: improve natural reading order by filtering OCR results (#1768)

Commit
2 years ago
feat: improve natural reading order by filtering OCR results (#1768) ### Summary Some `OCR` elements with only spaces in the text have full-page width in the bounding box, which causes the `xycut` sorting to not work as expected. Now the logic to parse OCR results removes any elements with only spaces (more than one space). --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
Parents
Loading