Improve fast partition cold start (#4242)
Improve PDF fast strategy cold-start latency by lazy-loading hi-res-only
imports in
[pdf.py](https://github.com/Unstructured-IO/unstructured/blob/1c3d5e6ef7b6123a2d8739bf9a8c3afecc3dd127/unstructured/partition/pdf.py).
This reduces first-call startup overhead without changing partition
behavior.
Local benchmarks show a significant fast strategy cold-start speedup of
~35% from 2.75s -> 1.78s.
They also show a small hi_res slowdown (~2-4%), which is acceptable
given the fast improvements.
Benchmark was run on 6 pdfs
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/DA-1p.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/chevron-page.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/embedded-images-tables.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/fake-memo-with-duplicate-page.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/interface-config-guide-p93.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/layout-parser-paper-fast.pdf
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Touches core PDF partitioning by changing import timing and locations;
behavior should be unchanged but there is some risk of
missed/conditional imports causing runtime errors in less-tested
hi_res/OCR/analysis paths.
>
> **Overview**
> Improves PDF `fast` strategy cold-start performance by **lazy-loading
hi-res-only dependencies** in `unstructured/partition/pdf.py` (moving
several `pdf_image`/`unstructured_inference`-related imports into
`_partition_pdf_or_image_local` and other hi-res/OCR-only code paths),
while keeping the `fast` path lighter.
>
> Adds `scripts/performance/quick_partition_bench.py` for quick local
cold vs warm partition timing across one or more PDFs, updates the table
metrics helper to import `convert_pdf_to_images` from `pdf_image_utils`,
and bumps the library version to `0.20.4` with corresponding changelog
entry.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
b66ae0e81ec30ad0910631d78c3dec12f1320a38. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->