unstructured
79552ff7 - Refactor threshold to annotation_threshold and make it an optional parameter (#2537)

Commit
1 year ago
Refactor threshold to annotation_threshold and make it an optional parameter (#2537) We are activating to configure the annotation threshold for links as an optional parameter. The reason for the change is that we ran into issues extracting simple text links from PDF documents that were created with MS Word. The sample PDF from unstructured worked with a default value of 0.9, and the PDF generated with Word resulted in a threshold of approx 0.67. We do use unstructured in together with langchain within an automated container deployment and to access by default the setting 'annotation_threshold' (refactored from 'threshold') can be very helpful. --------- Co-authored-by: Michael Niestroj <michael.niestroj@unblu.com> Co-authored-by: christinestraub <christinemstraub@gmail.com>
Author
Parents
Loading