unstructured
d64c57d3 - feat: consider rotated text as low fidelityfeat: consider rotated text (#4190)

Commit
4 days ago
feat: consider rotated text as low fidelityfeat: consider rotated text (#4190) This PR updates the function `is_text_embedded`: - now considers both if chars are invisible or rotated (as a result includes some refactoring of variable names) - rotated text elements can have wrong character order compared to natural reading order -> if feed into downstream applications like embedding text the element loses its semantic meaning - as a result this update flags texts with too many rotated characters as only partially embedded: its source is technically embedded but it may need post processing to be useful --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: badGarnet <badGarnet@users.noreply.github.com>
Author
Parents
Loading