unstructured
6595632a - enhancement: backup text categorization (#1322)

Commit

2 years ago

enhancement: backup text categorization (#1322) Currently there are some cases when `partition_pdf` is run using the `hi_res` strategy, in which elements can come back with category `UncategorizedText`. This happens when the detection model fails to detect an element, but we're able to find it anyway either because it was embedded in the PDF, or we found it using OCR. This commit is to allow for attempting to categorize these uncategorized elements using our text-based classification function, `element_from_text`.

References

#1322 - enhancement: backup text categorization

Author

qued

Parents

c2853e4a

unstructured 6595632a - enhancement: backup text categorization (#1322)

unstructured
6595632a - enhancement: backup text categorization (#1322)