LinearAlgebra: use band index in structured broadcast (#54075)
This adds a new `BandIndex` type internal to `LinearAlgebra` that
parallels a `CartesianIndex`, and stores a band index and a linear index
of an element along that band. If the index of the band is a
compile-time constant (which is often the case with
`Diagonal`/`Bidiagonal`/`Tridiagonal` matrices), constant-propagation
may eliminate branches in indexing into the matrix, and directly forward
the indexing to the corresponding diagonal. This is particularly
important in broadcasting for these matrices, which acts band-wise.
An example of an improvement in performance with this PR:
```julia
julia> using LinearAlgebra
julia> T = Tridiagonal(rand(893999), rand(894000), rand(893999)); # a large matrix
julia> @btime $T .+ $T;
5.387 ms (10 allocations: 20.46 MiB) # v"1.12.0-DEV.337"
2.872 ms (10 allocations: 20.46 MiB) # This PR
julia> @btime $T + $T; # reference
2.885 ms (10 allocations: 20.46 MiB)
```
This makes the broadcast operation as fast as the sum, where the latter
adds the diagonals directly.
I'm not 100% certain why this works as well as it does, as the constant
band index may get lost in the `newindex` computation. I suspect branch
prediction somehow works around this and preserves the constant.