unstructured
affb9d60 - refactor: deduplicate PDF rendering by delegating to unstructured-inference (#4315)

Commit
27 days ago
refactor: deduplicate PDF rendering by delegating to unstructured-inference (#4315) ## Summary - Delete `_render_pdf_pages` from `pdf_image_utils.py` (~70 lines) - Delegate `convert_pdf_to_image` and `convert_pdf_to_images` to `unstructured-inference`'s implementation (which already has lazy per-page rendering since v1.5.5) - Pass `env_config.PDF_RENDER_DPI` explicitly instead of relying on internal config - Bump `unstructured-inference` dep to `>=1.6.2` Peak memory for `path_only=True` drops from O(n_pages) to O(1 page) — 97% reduction on a 100-page PDF. ## Depends on - [ ] Unstructured-IO/unstructured-inference#501 (make `dpi` explicit) --------- Co-authored-by: codeflash-ai[bot] <178395242+codeflash-ai[bot]@users.noreply.github.com> Co-authored-by: Kevin Turcios <turcioskevinr@gmail.com>
Parents
Loading