unstructured
9459af43 - Fix: element extraction not working when using "auto" strategy for pdf (#2324)

Commit

2 years ago

Fix: element extraction not working when using "auto" strategy for pdf (#2324) Closes #2323. ### Summary - update logic to return "hi_res" if either `extract_images_in_pdf` or `extract_element_types` is set - refactor: remove unused `file` parameter from `determine_pdf_or_image_strategy()` ### Testing ``` from unstructured.partition.pdf import partition_pdf elements = partition_pdf( filename="example-docs/embedded-images-tables.pdf", extract_element_types=["Image"], extract_to_payload=True, ) image_elements = [el for el in elements if el.category == ElementType.IMAGE] print(image_elements) ```

References

#2324 - Fix: element extraction not working when using "auto" strategy for pdf

Author

christinestraub

Parents

dd144456

unstructured 9459af43 - Fix: element extraction not working when using "auto" strategy for pdf (#2324)

unstructured
9459af43 - Fix: element extraction not working when using "auto" strategy for pdf (#2324)