mypy
6bcd02e3 - [mypyc] Use cached ASCII characters in `CPyStr_GetItem` (#21035)

Commit

101 days ago

[mypyc] Use cached ASCII characters in `CPyStr_GetItem` (#21035) For characters < 256, use `PyUnicode_FromOrdinal()` which returns CPython's cached single-char Latin-1 string objects instead of allocating a new PyUnicode object on every `str[i]` access. This avoids allocation+deallocation overhead in character-scanning hot loops. Characters >= 256 (BMP, supplementary) keep the original `PyUnicode_New` allocation path unchanged. I ran the following micro-benchmark: Scan a 50k-character string with `s[i]` in a loop (repeated the benchmark 5000 times): | String type | Before (ms/iter) | After (ms/iter) | Speedup | |--------------------------|-------------------|-----------------|-----------------| | ASCII (0–127) | 0.651 | 0.166 | **3.9x (-75%)** | | Latin-1 (128–255) | 0.752 | 0.162 | **4.6x (-78%)** | | BMP (256–65535) | 0.901 | 0.809 | no change | | Supplementary (>65535) | 0.842 | 0.743 | no change | | Mixed (25% each) | 0.817 | 0.542 | **1.5x (-34%)** | <br /> This was coauthored with @tobymao

References

#21035 - [mypyc] Use cached ASCII characters in `CPyStr_GetItem`

Author

VaggelisD

Parents

0183a217

mypy 6bcd02e3 - [mypyc] Use cached ASCII characters in `CPyStr_GetItem` (#21035)

mypy
6bcd02e3 - [mypyc] Use cached ASCII characters in `CPyStr_GetItem` (#21035)