Refactor char/string and byte search (#54667)
This is a refactoring of `base/string/search.jl`. It is purely internal,
and comes with no changes in behaviour. It's based on #54593 and #54579,
so those needs to get merged first, then this PR will be rebased onto
master.
Included changes are:
* The char/string search functions now use the last byte to memchr, not
the first byte. Because the last bytes are more varied, this is much
faster on small non-ASCII alphabets (like searching Greek or Cyrillic
text) and somewhat faster on large non-ASCII ones (like Japanese). Speed
on ASCII alphabets (like English) in unchanged.
* Several unused or redundant methods have been removed
* Moved boundschecks from the inner `_search` and `_rsearch` functions
to the outer top-level functions that call them. This is because the
former may be called in a loop where repeated boundschecking is
needless. This should speed up search a bit.
* Char/string search functions are now implemented in terms of an
internal lazy iterator. This allows `findall` and `findnext` to share
implementation, and will also make it trivially easy to implement a lazy
findall in the future (see #43737)
IMO there is still more work to be done on this file, but this requires
a decision to be made on #43737, #54581 or #54584
## Benchmarks
```julia
using BenchmarkTools
using Random
rng = Xoshiro(55)
greek = join(rand(rng, 'Α':'ψ', 100000)) * 'ω'
@btime findfirst('ω', greek)
@btime findfirst(==('\xce'), greek)
english = join(rand(rng, 'A':'y', 100000)) * 'z'
@btime findfirst('z', english)
@btime findall('A', english)
@btime findall('\xff', english)
nothing
```
1.11.0-beta2:
```
100.049 μs (1 allocation: 16 bytes)
474.084 μs (0 allocations: 0 bytes)
689.110 ns (1 allocation: 16 bytes)
93.536 μs (9 allocations: 21.84 KiB)
72.316 μs (1 allocation: 32 bytes)
```
This PR:
```
1.319 μs (1 allocation: 16 bytes)
398.011 μs (0 allocations: 0 bytes)
681.550 ns (1 allocation: 16 bytes)
8.867 μs (8 allocations: 21.81 KiB)
683.962 ns (1 allocation: 32 bytes)
```