julia
582a6e6a - unroll tuple allequal for performance (#61433)

Commit

17 days ago

unroll tuple allequal for performance (#61433) in a similar vein to https://github.com/JuliaLang/julia/pull/61426, we can speed up `allequal` by unrolling the loop (up to a cap, 32 chosen by convention) I suppose this is not particularly a super common bottleneck but we may as well be faster where possible. master: ``` julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 5)) BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample. Range (min … max): 13.861 ns … 8.303 μs ┊ GC (min … max): 0.00% … 99.17% Time (median): 18.412 ns ┊ GC (median): 0.00% Time (mean ± σ): 33.582 ns ± 122.345 ns ┊ GC (mean ± σ): 6.08% ± 1.71% ▅▇█▇▅▂ ▁▄▄▄▃▃▁ ▁▄▅▄▃▃▂▁ ▃▄▄▃▂▁▁▂▂▁ ▁▂▂▁ ▁▁▃▂ ▂ ██████▅▅▃▅▃▁▄▅███████▇▆████████▇▇▇███████████████▇▆▆█████▆▇▆ █ 13.9 ns Histogram: log(frequency) by time 83.2 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 12)) BenchmarkTools.Trial: 624 samples with 997 evaluations per sample. Range (min … max): 16.090 ns … 42.490 μs ┊ GC (min … max): 0.00% … 73.54% Time (median): 10.193 μs ┊ GC (median): 0.00% Time (mean ± σ): 8.034 μs ± 4.193 μs ┊ GC (mean ± σ): 0.62% ± 2.94% █ ▆▅▆ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▆▅█▃▂▁▁▁▁▁▁▁▆███▆▃▃ ▃ 16.1 ns Histogram: frequency by time 11.3 μs < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 56)) BenchmarkTools.Trial: 480 samples with 1 evaluation per sample. Range (min … max): 9.840 ms … 48.062 ms ┊ GC (min … max): 0.00% … 76.38% Time (median): 10.312 ms ┊ GC (median): 0.00% Time (mean ± σ): 10.399 ms ± 1.744 ms ┊ GC (mean ± σ): 0.74% ± 3.49% ▁▇ ▁▆▄▁▂▃▂▃▃▆█▆▃▄▂▁▁▁ ▁ ▄▄▃▅▇██▇██████████████████▇█▄▄▄▃▂▂▁▃▃▃▂▁▂▂▁▁▁▂▁▁▂▁▂▃▂▁▁▁▁▁▂ ▄ 9.84 ms Histogram: frequency by time 11.5 ms < Memory estimate: 1.45 MiB, allocs estimate: 27954. ``` PR ``` julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 5)) BenchmarkTools.Trial: 10000 samples with 998 evaluations per sample. Range (min … max): 14.445 ns … 91.516 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 16.868 ns ┊ GC (median): 0.00% Time (mean ± σ): 16.809 ns ± 1.603 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▅▃▁ █▁▁▂▁ ▁▂▄▅▄▃▄▄▄▃▂▂▂▁▂▄▇█████▇▅▃▃▂▃▇█████▇▄▃▂▂▃▃▃▄▄▄▅▄▄▃▂▂▁▁▁▁▁▁▁▁ ▃ 14.4 ns Histogram: frequency by time 19.6 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 12)) BenchmarkTools.Trial: 952 samples with 998 evaluations per sample. Range (min … max): 15.697 ns … 20.862 μs ┊ GC (min … max): 0.00% … 62.59% Time (median): 6.387 μs ┊ GC (median): 0.00% Time (mean ± σ): 5.256 μs ± 3.257 μs ┊ GC (mean ± σ): 0.48% ± 2.84% █ █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▃▂▂▃▃▃▃▄▄▃▄▄▄▃▃▄▄▅▄▄▃▃▃▄▃▃▃▃▃▂ ▃ 15.7 ns Histogram: frequency by time 9.37 μs < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark allequal(t) setup=(t=ntuple(i->rand((1.0, 2)), 56)) BenchmarkTools.Trial: 645 samples with 1 evaluation per sample. Range (min … max): 6.847 ms … 23.438 ms ┊ GC (min … max): 0.00% … 62.03% Time (median): 7.830 ms ┊ GC (median): 0.00% Time (mean ± σ): 7.730 ms ± 827.062 μs ┊ GC (mean ± σ): 0.29% ± 2.44% ▁▂▃▁ ▅█▄▁▁ ▃▇████▆█▇▆▄▄▄▄▃▄▃▃▃▃▃▄▄▄▇█████▇▇▆▇▄▅▄▃▄▅▄▃▄▃▃▄▃▃▃▄▃▃▃▃▂▃▁▃▂ ▄ 6.85 ms Histogram: frequency by time 9.08 ms < Memory estimate: 488.16 KiB, allocs estimate: 9482. ```

References

#61433 - unroll tuple allequal for performance

Author

adienes

Parents

b5ecba00

julia 582a6e6a - unroll tuple allequal for performance (#61433)

julia
582a6e6a - unroll tuple allequal for performance (#61433)