julia
7de55850 - Use `size` based `MultiplicativeInverse` to speedup sequential access of `ReshapedArray` (#43518)

Commit
220 days ago
Use `size` based `MultiplicativeInverse` to speedup sequential access of `ReshapedArray` (#43518) This performance difference was found when working on #42736. Currently, our `ReshapedArray` use stride based `MultiplicativeInverse` to speed up index transformation. For example, for `a::AbstractArray{T,3}` and `b = vec(a)`, the index transformation is equivalent to: ```julia offset = i - 1 # b[i] d1, r1 = divrem(offset, stride(a, 3)) # stride(a, 3) = size(a, 1) * size(a, 2) d2, r2 = divrem(r1, stride(a, 2)) # stride(a, 2) = size(a, 1) CartesianIndex(r2 + 1, d2 +1, d1 + 1) # a has one-based axes ``` (All the `stride` is replaced with a `MultiplicativeInverse` to accelerate `divrem`) This PR wants to replace the above machinery with: ```julia offset = i - 1 d1, r1 = divrem(offset, size(a, 1)) d2, r2 = divrem(d1, size(a, 2)) CartesianIndex(r1 + 1, r2 +1, d2 + 1) ``` For random access, they should have the same computational cost. But for sequential access, like `sum(b)`, `size` based transformation seems faster. To avoid bottleneck from IO, use `reshape(::CartesianIndices, x...)` to benchmark: ```julia f(x) = let r = 0 for i in eachindex(x) @inbounds r |= +(x[i].I...) end r end a = CartesianIndices((99,100,101)); @btime f(vec($a)); #2.766 ms --> 2.591 ms @btime f(reshape($a,990,1010)); #3.412 ms --> 2.626 ms @btime f(reshape($a,33,300,101)); #3.422 ms --> 2.342 ms ``` I haven't looked into the reason for this performance difference. Beside acceleration, this also makes it possible to reuse the `MultiplicativeInverse` in some cases (like #42736). So I think it might be useful? --------- Co-authored-by: Andy Dienes <51664769+adienes@users.noreply.github.com> Co-authored-by: Andy Dienes <andydienes@gmail.com>
Author
Parents
Loading