fix: pdfminer drops extractable text (#4310)
<!-- CURSOR_SUMMARY -->
> [!NOTE]
> **Medium Risk**
> Changes pdfminer integration to override CID font/CMap handling and
introduces custom stream decoding/parsing, which can affect text
extraction behavior and performance on diverse PDFs (mitigated by
size/mapping caps).
>
> **Overview**
> Fixes PDFs where **body text was silently dropped** because CIDFonts
used an *embedded Encoding CMap stream* that `pdfminer.six` doesn’t
resolve.
>
> Adds a bounded embedded-CMap decoder/parser and wires it in via
`CustomPDFCIDFont` + `CustomPDFResourceManager` so `init_pdfminer()`
constructs CID fonts with a parsed CMap (including `WMode`), with
DoS-oriented caps on decompression and total mappings.
>
> Updates tests with a new fixture-driven regression for both `FAST` and
`HI_RES` strategies plus targeted unit tests for CMap parsing/stream
decoding, and bumps version to `0.22.12` with a changelog entry.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
4326b15f6c400e81112f894576941d28fb150da7. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>