unstructured
2931cb38 - fix: handle KeyError: 'N' for certain pdfs (#2072)

Commit
2 years ago
fix: handle KeyError: 'N' for certain pdfs (#2072) Closes #2059. We've found some pdfs that throw an error in pdfminer. These files use a ICCBased color profile but do not include an expected value `N`. As a workaround, we can wrap pdfminer and drop any colorspace info, since we don't need to render the document. To verify, try to partition the document in the linked issue. ``` elements = partition(filename="google-2023-environmental-report_condensed.pdf", strategy="fast") ``` --------- Co-authored-by: cragwolfe <crag@unstructured.io>
Author
Parents
Loading