unstructured
enhancement: remove duplicate embedded images
#2897
Merged

enhancement: remove duplicate embedded images #2897

christinestraub
christinestraub feat: add `clean_pdfminer_duplicate_image_elements()`
cc91ca5e
christinestraub feat: add env_config `EMBEDDED_IMAGE_SAME_REGION_THRESHOLD`
7bb358a2
christinestraub Merge branch 'main' into feat/remove-duplicate-embedded-images
200e912a
christinestraub refactor: reorganize `clean_pdfminer_inner_elements()`
52e74192
christinestraub chore: update changelog & version
89b65e4f
christinestraub refactor
af6e082d
christinestraub test: add unit test
15e66d04
christinestraub test: fix lint error
d6d995b9
christinestraub Merge branch 'main' into feat/remove-duplicate-embedded-images
7024d2ce
christinestraub chore: bump version
f2833b9e
christinestraub christinestraub marked this pull request as ready for review 2 years ago
christinestraub christinestraub requested a review from badGarnet badGarnet 2 years ago
christinestraub christinestraub requested a review from scanny scanny 2 years ago
christinestraub christinestraub requested a review from cragwolfe cragwolfe 2 years ago
scanny
scanny commented on 2024-04-18
cragwolfe
cragwolfe approved these changes on 2024-04-18
christinestraub refactor: remove unused `defaultdict`
79ca250a
christinestraub christinestraub merged ac5048bf into main 2 years ago
christinestraub christinestraub deleted the feat/remove-duplicate-embedded-images branch 2 years ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone