unstructured
3ec3673d - feat: staging function to extract element text into one string (#1741)

Commit
2 years ago
feat: staging function to extract element text into one string (#1741) ### Summary In order to enable larger scale testing of the new text extraction metrics, create a helper function to get the clean, concatenated text (CCT) from partitioned elements. ### Test Partition any file, then pass the resulting elements into the new `elements_to_text` function. Can test getting the output as string or as text file. ``` from unstructured.partition.auto import partition from unstructured.staging.base import elements_to_text elements = partition(filename="example-docs/chevron-page.pdf", strategy="hi_res") elements_text = elements_to_text(elements, "output-text-file.txt") print(elements_text) ```
Author
Parents
Loading