Optimize CustomPDFPageInterpreter._patch_current_chars_with_render_mode
Runtime improvement: the change reduces average call time from ~174μs to ~128μs (≈35% faster overall), with the biggest wins on workloads that iterate many items (e.g., the 1,000-item test improved ~49%).
What changed
- Combined two checks into one short‑circuiting conditional:
- Old: check hasattr(item, "rendermode") first, continue if present, then isinstance(item, LTChar) before assignment.
- New: if isinstance(item, LTChar) and not hasattr(item, "rendermode"): assign.
- Removed the explicit continue and the separate hasattr branch; the logic is identical from a correctness perspective.
Why this speeds things up
- Fewer attribute lookups per loop iteration. In Python attribute access (hasattr / attribute lookup) is relatively expensive. The original code executed hasattr(item, "rendermode") for every item, even for non-LTChar objects. By testing isinstance(item, LTChar) first, the hasattr call is avoided for non-LTChar items (the common case), so we save an attribute lookup per non-LTChar item.
- Short‑circuit evaluation reduces bytecode and branching overhead (no separate continue branch).
- These savings compound in the hot path: this method is invoked from do_TJ and do_Tj while parsing text, so it runs many times per page. The profiler and tests show the loop-level savings lead to measurable end-to-end runtime improvement.
Profiler & test evidence
- Overall profiler total time dropped from 0.001776s → 0.001371s.
- The large-scale test (1000 LTChar objects) went from 113μs → 76.2μs (~49.5% faster), demonstrating the optimization scales with number of items.
- Many unit/test cases also show smaller but consistent improvements (see annotated_tests). A single small regression was observed in one micro-benchmark (+~5% in a very narrow case where items are always LTChar and already patched), which is an acceptable trade-off given the overall runtime/throughput gains.
Impact on workloads
- Best for PDFs with many text items or mixed-type _objs lists (many non-LTChar items): savings are greatest because we avoid unnecessary hasattr calls for non-LTChar entries.
- Safe to merge: behavior is preserved (pre-existing rendermode attributes are still honored), and dependencies/visibility are unchanged.
Summary
- Primary benefit: reduced runtime (35% overall, large improvements on heavier inputs).
- Cause: eliminated redundant attribute checks via a single short‑circuiting conditional, lowering per-item overhead in a hot path.
- Trade-off: negligible; one tiny micro-benchmark showed a small regression, but the overall throughput and real-workload performance improved significantly.