[mypyc] Use cached ASCII characters in `CPyStr_GetItem` (#21035)
For characters < 256, use `PyUnicode_FromOrdinal()` which returns
CPython's cached single-char Latin-1 string objects instead of
allocating a new PyUnicode object on every `str[i]` access. This avoids
allocation+deallocation overhead in character-scanning hot loops.
Characters >= 256 (BMP, supplementary) keep the original `PyUnicode_New`
allocation path unchanged.
I ran the following micro-benchmark: Scan a 50k-character string with
`s[i]` in a loop (repeated the benchmark 5000 times):
| String type | Before (ms/iter) | After (ms/iter) | Speedup |
|--------------------------|-------------------|-----------------|-----------------|
| ASCII (0–127) | 0.651 | 0.166 | **3.9x (-75%)** |
| Latin-1 (128–255) | 0.752 | 0.162 | **4.6x (-78%)** |
| BMP (256–65535) | 0.901 | 0.809 | no change |
| Supplementary (>65535) | 0.842 | 0.743 | no change |
| Mixed (25% each) | 0.817 | 0.542 | **1.5x (-34%)** |
<br />
This was coauthored with @tobymao