Refactor threshold to annotation_threshold and make it an optional parameter (#2537)
We are activating to configure the annotation threshold for links as an
optional parameter.
The reason for the change is that we ran into issues extracting simple
text links from PDF documents that were created with MS Word. The sample
PDF from unstructured worked with a default value of 0.9, and the PDF
generated with Word resulted in a threshold of approx 0.67.
We do use unstructured in together with langchain within an automated
container deployment and to access by default the setting
'annotation_threshold' (refactored from 'threshold') can be very
helpful.
---------
Co-authored-by: Michael Niestroj <michael.niestroj@unblu.com>
Co-authored-by: christinestraub <christinemstraub@gmail.com>