unstructured
85200c3d - fix: remove redundant do_Tj override that double-patches chars

Commit
44 days ago
fix: remove redundant do_Tj override that double-patches chars pdfminer's base do_Tj delegates to self.do_TJ([s]), which already dispatches to the overridden do_TJ. The do_Tj override was patching the same char range a second time. Repro (add print traces to do_TJ/do_Tj, run against any PDF): from unstructured.partition.pdf_image.pdfminer_utils import open_pdfminer_pages_generator with open("example-docs/pdf/reliance.pdf", "rb") as f: for page, layout in open_pdfminer_pages_generator(f): break Before this fix, every Tj op produces two patch calls with the same start index: [TRACE] do_TJ patching from 9 [TRACE] do_Tj patching from 9 <- redundant
Author
Parents
Loading