julia
59320c62 - Refactor char/string and byte search (#54667)

Commit

289 days ago

Refactor char/string and byte search (#54667) This is a refactoring of `base/string/search.jl`. It is purely internal, and comes with no changes in behaviour. It's based on #54593 and #54579, so those needs to get merged first, then this PR will be rebased onto master. Included changes are: * The char/string search functions now use the last byte to memchr, not the first byte. Because the last bytes are more varied, this is much faster on small non-ASCII alphabets (like searching Greek or Cyrillic text) and somewhat faster on large non-ASCII ones (like Japanese). Speed on ASCII alphabets (like English) in unchanged. * Several unused or redundant methods have been removed * Moved boundschecks from the inner `_search` and `_rsearch` functions to the outer top-level functions that call them. This is because the former may be called in a loop where repeated boundschecking is needless. This should speed up search a bit. * Char/string search functions are now implemented in terms of an internal lazy iterator. This allows `findall` and `findnext` to share implementation, and will also make it trivially easy to implement a lazy findall in the future (see #43737) IMO there is still more work to be done on this file, but this requires a decision to be made on #43737, #54581 or #54584 ## Benchmarks ```julia using BenchmarkTools using Random rng = Xoshiro(55) greek = join(rand(rng, 'Α':'ψ', 100000)) * 'ω' @btime findfirst('ω', greek) @btime findfirst(==('\xce'), greek) english = join(rand(rng, 'A':'y', 100000)) * 'z' @btime findfirst('z', english) @btime findall('A', english) @btime findall('\xff', english) nothing ``` 1.11.0-beta2: ``` 100.049 μs (1 allocation: 16 bytes) 474.084 μs (0 allocations: 0 bytes) 689.110 ns (1 allocation: 16 bytes) 93.536 μs (9 allocations: 21.84 KiB) 72.316 μs (1 allocation: 32 bytes) ``` This PR: ``` 1.319 μs (1 allocation: 16 bytes) 398.011 μs (0 allocations: 0 bytes) 681.550 ns (1 allocation: 16 bytes) 8.867 μs (8 allocations: 21.81 KiB) 683.962 ns (1 allocation: 32 bytes) ```

References

#54667 - Refactor char/string and byte search

Author

jakobnissen

Parents

3be18c35

julia 59320c62 - Refactor char/string and byte search (#54667)

julia
59320c62 - Refactor char/string and byte search (#54667)