ruff
8eef2fca - [ty] Replace `strsim` with CPython-based Levenshtein implementation (#23291)

Commit
109 days ago
[ty] Replace `strsim` with CPython-based Levenshtein implementation (#23291) ## Summary For a couple of diagnostics currently, we add a "Did you mean...?" diagnostic hint if it appears like there's an obvious typo that caused us to emit an error. The "Did you mean...?" suggestion is generated via the `strsim` Levenshtein implementation on `crates.io`. This PR replaces the `strsim` implementation of Levenshtein used to create these hints with a custom Levenshtein implementation based on the one that CPython itself uses to create these hints: ```pycon >>> class Foo: ... xyxy = 42 ... >>> Foo.xyxyz Traceback (most recent call last): File "<python-input-1>", line 1, in <module> Foo.xyxyz AttributeError: type object 'Foo' has no attribute 'xyxyz'. Did you mean: 'xyxy'? ``` The added tests are also derived from CPython's test suite. The motivation for copying CPython's implementation almost exactly is that CPython has had this feature for several Python versions now, and during that time many bug reports have been filed regarding incorrect suggestions, which have since been fixed. This implementation is thus very well "battle-tested" by this point; we can say with a reasonable degree of confidence that it gives good suggestions for typos in the Python context. The ecosystem report on this PR bears out that this is an improvement. We see bad suggestions going away: ```diff - [error] invalid-key - Unknown key "pair" for TypedDict `RPCAnalyzedDFMsg` - did you mean "data"? + [error] invalid-key - Unknown key "pair" for TypedDict `RPCAnalyzedDFMsg`: Unknown key "pair" ``` and good suggestions being added: ```diff - [error] invalid-key - Unknown key "old_entity_id" for TypedDict `_EventEntityRegistryUpdatedData_CreateRemove`: Unknown key "old_entity_id" + [error] invalid-key - Unknown key "old_entity_id" for TypedDict `_EventEntityRegistryUpdatedData_CreateRemove` - did you mean "entity_id"? ``` This Levenshtein implementation was originally proposed in #18705, and then again in #18751. Those PRs also made other changes to use the Levenshtein implementation in certain other areas, however, where computing the list of suggestions to pass into the Levenshtein algorithm turned out to be prohibitively expensive. This PR therefore _only_ updates the Levenshtein implementation being used for our existing subdiagnostics, rather than expanding the callsites of the Levenshtein implementation. ## Test plan Unit tests have been added in `levenshtein.rs`. Some mdtests and snapshots were updated to ensure that they still test what they're meant to be testing, even with the new Levenshtein implementation. Co-authored-by: Brent Westbrook <brentrwestbrook@gmail.com>
Author
Parents
Loading