Replace `MulAddMul` by `alpha,beta` in `__muldiag` (#56360)
This PR replaces `MulAddMul` arguments by `alpha, beta` pairs in the
multiplication methods involving `Diagonal` matrices, and constructs the
objects exactly where they are required. Such an approach improves
latency.
```julia
julia> D = Diagonal(1:2000); A = rand(size(D)...); C = similar(A);
julia> @time mul!(C, A, D, 1, 2); # first-run latency is reduced
0.129741 seconds (180.18 k allocations: 9.607 MiB, 88.87% compilation time) # nightly v"1.12.0-DEV.1505"
0.083005 seconds (146.68 k allocations: 7.442 MiB, 82.94% compilation time) # this PR
julia> @btime mul!($C, $A, $D, 1, 2); # runtime performance is unaffected
4.983 ms (0 allocations: 0 bytes) # nightly
4.938 ms (0 allocations: 0 bytes) # this PR
```
This PR sets the stage for a similar change for
`Bidiagonal`/`Tridiaognal` matrices, which would lead to a bigger
reduction in latencies.