nvda
8a32a09e - Normalize decorative Unicode letters not handled by NFKC (#19608)

Commit
15 days ago
Normalize decorative Unicode letters not handled by NFKC (#19608) Closes #17120 Summary of the issue: NVDA's Unicode normalization (NFKC) does not decompose certain decorative Unicode letter characters, causing them to be read as their full Unicode name or as silence. This affects negative squared Latin capital letters (U+1F170–U+1F189), negative circled Latin capital letters (U+1F150–U+1F169), and regional indicator symbol letters (U+1F1E6–U+1F1FF). Description of user facing changes: When Unicode normalization is enabled, decorative Unicode letters such as negative squared (🅰🅱🅲), negative circled, and regional indicator symbol characters are now correctly read as their base Latin letters (A, B, C, etc.) in both speech and braille. Description of developer facing changes: Added _buildSupplementaryNormalizationTable() which builds a translation table for the three affected Unicode ranges. Extended unicodeNormalize() to apply the supplementary table via str.translate() before standard NFKC normalization. Extended isUnicodeNormalized() to detect characters in the supplementary table. Applied the supplementary table in UnicodeNormalizationOffsetConverter.__init__() so the braille code path also handles these characters correctly. Description of development approach: The standard unicodedata.normalize("NFKC", ...) does not define decompositions for these Supplementary Multilingual Plane characters. A supplementary translation table maps each codepoint to its plain Latin letter. This table is applied via str.translate() before NFKC normalization in both the speech path (unicodeNormalize()) and the braille path (UnicodeNormalizationOffsetConverter). Since all mappings are single-codepoint to single-codepoint, the existing offset converter logic handles them correctly without changes to the offset mapping algorithm.
Author
Parents
Loading