julia
f93138ed - Specialize `isbanded` for `StridedMatrix` (#56487)

Commit
1 year ago
Specialize `isbanded` for `StridedMatrix` (#56487) This improves performance, as the loops in `istriu` and `istril` may be fused to improve cache-locality. This also changes the quick-return behavior, and only returns after the check over all the upper or lower bands for a column is complete. ```julia julia> using LinearAlgebra julia> A = zeros(2, 10_000); julia> @btime isdiag($A); 32.682 μs (0 allocations: 0 bytes) # nightly v"1.12.0-DEV.1593" 9.481 μs (0 allocations: 0 bytes) # this PR julia> A = zeros(10_000, 2); julia> @btime isdiag($A); 10.288 μs (0 allocations: 0 bytes) # nightly 2.579 μs (0 allocations: 0 bytes) # this PR julia> A = zeros(100, 100); julia> @btime isdiag($A); 6.616 μs (0 allocations: 0 bytes) # nightly 3.075 μs (0 allocations: 0 bytes) # this PR julia> A = diagm(0=>1:100); A[3,4] = 1; julia> @btime isdiag($A); 2.759 μs (0 allocations: 0 bytes) # nightly 85.371 ns (0 allocations: 0 bytes) # this PR ``` A similar change is added to `istriu`/`istril` as well, so that ```julia julia> A = zeros(2, 10_000); julia> @btime istriu($A); # trivial 7.358 ns (0 allocations: 0 bytes) # nightly 13.779 ns (0 allocations: 0 bytes) # this PR julia> @btime istril($A); 33.464 μs (0 allocations: 0 bytes) # nightly 9.476 μs (0 allocations: 0 bytes) # this PR julia> A = zeros(10_000, 2); julia> @btime istriu($A); 10.020 μs (0 allocations: 0 bytes) # nightly 2.620 μs (0 allocations: 0 bytes) # this PR julia> @btime istril($A); # trivial 6.793 ns (0 allocations: 0 bytes) # nightly 14.473 ns (0 allocations: 0 bytes) # this PR julia> A = zeros(100, 100); julia> @btime istriu($A); 3.435 μs (0 allocations: 0 bytes) # nightly 1.637 μs (0 allocations: 0 bytes) # this PR julia> @btime istril($A); 3.353 μs (0 allocations: 0 bytes) # nightly 1.661 μs (0 allocations: 0 bytes) # this PR ``` --------- Co-authored-by: Daniel Karrasch <daniel.karrasch@posteo.de>
Author
Parents
Loading