nvda
8a32a09e - Normalize decorative Unicode letters not handled by NFKC (#19608)

Commit

49 days ago

Normalize decorative Unicode letters not handled by NFKC (#19608) Closes #17120 Summary of the issue: NVDA's Unicode normalization (NFKC) does not decompose certain decorative Unicode letter characters, causing them to be read as their full Unicode name or as silence. This affects negative squared Latin capital letters (U+1F170–U+1F189), negative circled Latin capital letters (U+1F150–U+1F169), and regional indicator symbol letters (U+1F1E6–U+1F1FF). Description of user facing changes: When Unicode normalization is enabled, decorative Unicode letters such as negative squared (🅰🅱🅲), negative circled, and regional indicator symbol characters are now correctly read as their base Latin letters (A, B, C, etc.) in both speech and braille. Description of developer facing changes: Added _buildSupplementaryNormalizationTable() which builds a translation table for the three affected Unicode ranges. Extended unicodeNormalize() to apply the supplementary table via str.translate() before standard NFKC normalization. Extended isUnicodeNormalized() to detect characters in the supplementary table. Applied the supplementary table in UnicodeNormalizationOffsetConverter.__init__() so the braille code path also handles these characters correctly. Description of development approach: The standard unicodedata.normalize("NFKC", ...) does not define decompositions for these Supplementary Multilingual Plane characters. A supplementary translation table maps each codepoint to its plain Latin letter. This table is applied via str.translate() before NFKC normalization in both the speech path (unicodeNormalize()) and the braille path (UnicodeNormalizationOffsetConverter). Since all mappings are single-codepoint to single-codepoint, the existing offset converter logic handles them correctly without changes to the offset mapping algorithm.

References

#19608 - Normalize decorative Unicode letters not handled by NFKC

Author

bramd

Parents

c1f7bc26

nvda 8a32a09e - Normalize decorative Unicode letters not handled by NFKC (#19608)

nvda
8a32a09e - Normalize decorative Unicode letters not handled by NFKC (#19608)