nvda
2238cd96 - TextInfo move by codepoint characters function (#16219)

Commit

2 years ago

TextInfo move by codepoint characters function (#16219) This function is needed for both #8518 and #16050. Summary of the issue: Suppose we have TextInfo that represents a paragraph of text: ``` > s = paragraphInfo.text > s 'Hello, world!\r' ``` Suppose that we would like to put the cursor at the first letter of the word 'world'. That means jumping to index 7: ``` > s[7:] 'world!\r' ``` The problem is that calling paragraphInfo.move(UNIT_CHARACTER, 7, "start") is not guaranteed to achieve desired effect. There are two main reasons for that: 1. In Wide character encoding, some 4-byte unicode characters are represented as two surrogate characters, whereas in pythonic string they would be represented by a single character. 2. In non-offset TextInfos (e.g. UIATextInfo) there is no guarantee on the fact that TextInfos.move(UNIT_CHARACTER, 1)would actually move by exactly 1 character. A good illustration of this is in Microsoft Word with UIA enabled always, the first character of a bullet list item would be represented by three pythonic characters: ◦ Bullet character "•" ◦ Tab character \t ◦ And the first character of of list item per se. 3. The third problem of TextInfo.move(UNIT_CHARACTER) function is its performance in some implementations. In particular, moving by 10000 characters in Notepad++ takes over a second on a reasonably modern PC. I might not need to move by 10000 characters in my upcoming PRs, but I will need to move by a few thousands for sure since for sentence navigation I would need to move within a paragraph and some large paragraphs in typical texts can easily be few thousands characters. I need to find both beginning and end textInfos, and if each operation takes say 200ms, then we'd be wasting almost half a second on just moving by characters. Since there were previous concerns about sentence navigation being not fast enough, II would like to introduce this efficient implementation. Here is how this can be done efficiently using this PR: ``` > info = paragraphInfo.moveToPythonicOffset(7) > info.setEndPoint(paragraphInfo, "endToEnd") > info.text 'world!\r' ``` Description of development approach 1. For general case, I implemented binary-search-like algorithm. I explained it in great detail in the code. Please see def moveToPythonicOffset function in textInfos\__init__.py. 2. I provided optimized implementations for OffsetsTextInfo and CompoundTextInfo. 3. I refactored textUtils.py making it conformant to OOP style. I implemented UTF8OffsetConverter and dummy IdentityOffsetConverter as well as their abstract base class and a function getOffsetConverter that selects correct converter based on encoding. I renamed a couple of methods of WideStringOffsetConverter in order to remove the word wide - as now I would like to use similar methods for UTF8 converter, and it has nothing to do with wide strings.

References

#16219 - TextInfo move by codepoint characters function

Author

mltony

Parents

25f562f7

nvda 2238cd96 - TextInfo move by codepoint characters function (#16219)

nvda
2238cd96 - TextInfo move by codepoint characters function (#16219)