perf: speed up standardize_quotes with str.translate() (#4314)
## Summary
- Replace per-character regex with a precomputed `str.maketrans()` +
`str.translate()` table for `standardize_quotes`
- Covers all 36 Unicode fancy-quote codepoints (double + single) from
the original regex
- Adds a benchmark (`test_unstructured/benchmarks/`) to track
`standardize_quotes` performance
### Benchmark (Azure Standard_D8s_v5 — 8 vCPU Intel Xeon Platinum 8473C,
32 GiB RAM)
## Benchmark: `origin/main` vs `codeflash/op`
### test_benchmark_standardize_quotes
| | Min | Median | Mean | OPS | Rounds |
|:---|---:|---:|---:|---:|---:|
| `origin/main` (base) | 161.25μs | 199.57μs | 200.72μs | 4.98 Kops/s |
5,461 |
| `codeflash/op` (head) | 99.17μs | 126.40μs | 127.86μs | 7.82 Kops/s |
10,581 |
| **Speedup** | **🟢 1.63x** | **🟢 1.58x** | **🟢 1.57x** | **🟢 1.57x** |
|
| Function | base (μs) | head (μs) | Improvement | Speedup |
|:---|---:|---:|:---|---:|
| `standardize_quotes` | 128.60μs | 53.86μs | `██████░░░░` +58% | 🟢
2.39x |
---
*Generated by codeflash optimization agent*
<details>
<summary><b>Reproduce the benchmark locally</b></summary>
This PR includes a pytest-benchmark test at
`test_unstructured/benchmarks/test_benchmark_standardize_quotes.py`. To
run it:
```bash
pip install pytest-benchmark
pytest test_unstructured/benchmarks/test_benchmark_standardize_quotes.py --benchmark-only
```
To compare against `main`:
```bash
# Run on main and save baseline
git stash && git checkout main
pytest test_unstructured/benchmarks/test_benchmark_standardize_quotes.py --benchmark-only --benchmark-save=baseline
# Run on this branch and compare
git checkout - && git stash pop
pytest test_unstructured/benchmarks/test_benchmark_standardize_quotes.py --benchmark-only --benchmark-compare=0001_baseline
```
</details>
## Changelog
Added entry in `CHANGELOG.md` under 0.22.13.
## Test plan
- [x] Benchmarked on Azure VM (Standard_D8s_v5)
- [x] Existing unit tests pass — `standardize_quotes` is a drop-in
replacement
- [x] All 36 quote codepoints covered by the translation table
---------
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>