LinearAlgebra: copyto! between banded matrix types (#54041)
This specialized `copyto!` for combinations of banded structured matrix
types so that the copy may be O(N) instead of the fallback O(N^2)
implementation.
E.g.:
```julia
julia> T = Tridiagonal(zeros(999), zeros(1000), zeros(999));
julia> B = Bidiagonal(ones(1000), fill(2.0, 999), :U);
julia> @btime copyto!($T, $B);
1.927 ms (0 allocations: 0 bytes) # master
229.870 ns (0 allocations: 0 bytes) # PR
```
This also changes the `copyto!` implementation for mismatched matrix
sizes, bringing it closer to the docstring. So, the following works on
master:
```julia
julia> Ddest = Diagonal(zeros(4));
julia> Dsrc = Diagonal(ones(2));
julia> copyto!(Ddest, Dsrc)
4×4 Diagonal{Float64, Vector{Float64}}:
1.0 ⋅ ⋅ ⋅
⋅ 1.0 ⋅ ⋅
⋅ ⋅ 0.0 ⋅
⋅ ⋅ ⋅ 0.0
```
but this won't work anymore with this PR. This was inconsistent anyway,
as materializing the matrices produces a different result, which
shouldn't be the case:
```julia
julia> copyto!(Matrix(Ddest), Dsrc)
4×4 Matrix{Float64}:
1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 0.0 0.0
1.0 0.0 0.0 0.0
```
After this PR, the way to carry out the copy would be
```julia
julia> copyto!(Ddest, CartesianIndices(Dsrc), Dsrc, CartesianIndices(Dsrc))
4×4 Diagonal{Float64, Vector{Float64}}:
1.0 ⋅ ⋅ ⋅
⋅ 1.0 ⋅ ⋅
⋅ ⋅ 0.0 ⋅
⋅ ⋅ ⋅ 0.0
```
This change fixes https://github.com/JuliaLang/julia/issues/46005.
Also fixes https://github.com/JuliaLang/julia/issues/53997
After this,
```julia
julia> @btime copyto!(C, B) setup=(n = 1_000; B = Bidiagonal(randn(n), randn(n-1), :L); C = Bidiagonal(randn(n), randn(n-1), :L));
158.405 ns (0 allocations: 0 bytes)
julia> @btime copyto!(C, B) setup=(n = 10_000; B = Bidiagonal(randn(n), randn(n-1), :L); C = Bidiagonal(randn(n), randn(n-1), :L));
4.706 μs (0 allocations: 0 bytes)
julia> @btime copyto!(C, B) setup=(n = 100_000; B = Bidiagonal(randn(n), randn(n-1), :L); C = Bidiagonal(randn(n), randn(n-1), :L));
120.880 μs (0 allocations: 0 bytes)
```
which is roughly linear scaling.
Taken along with https://github.com/JuliaLang/julia/pull/54027, the
speed-ups would also apply to the adjoints of banded matrices.