julia
d183ee1b - Improve performance of `ncodeunits(::Char)` (#54001)

Commit

1 year ago

Improve performance of `ncodeunits(::Char)` (#54001) This improves performance of `ncodeunits(::Char)` by simply counting the number of non-zero bytes (except for `\0`, which is encoded as all zero bytes). For a performance comparison, see [this gist]( https://gist.github.com/Seelengrab/ebb02d4b8d754700c2869de8daf88cad); there's an up to 10x improvement here for collections of `Char`, with a minor improvement for single `Char` (with much smaller spread). The version in this PR is called `nbytesencoded` in the benchmarks. Correctness has been verified with Supposition.jl, using the existing implementation as an oracle: ```julia julia> using Supposition julia> const chars = Data.Characters() julia> @check max_examples=1_000_000 function bytesenc(i=Data.Integers{UInt32}()) c = reinterpret(Char, i) ncodeunits(c) == nbytesdiv(c) end; Test Summary: | Pass Total Time bytesenc | 1 1 1.0s julia> ncodeunits('\0') == nbytesencoded('\0') true ``` Let's see if CI agrees! Notably, neither the existing nor the new implementation check whether the given `Char` is valid or not, since the only thing that matters is how many bytes are written out. --------- Co-authored-by: Sukera <Seelengrab@users.noreply.github.com>

References

#54001 - Improve performance of `ncodeunits(::Char)`

Author

Seelengrab

Parents

f870ea0a

julia d183ee1b - Improve performance of `ncodeunits(::Char)` (#54001)

julia
d183ee1b - Improve performance of `ncodeunits(::Char)` (#54001)