Reduce branches in 2x2 and 3x3 stable_muladdmul for standard cases (#54951)
We may use the knowledge that `alpha != 0` at the call site to hard-code
`alpha = true` in the `MulAddMul` constructor if `alpha isa Bool`. This
eliminates the `!isone(alpha)` branches in `@stable_muladdmul`, and
reduces latency in matrix multiplication.
```julia
julia> using LinearAlgebra
julia> A = rand(2,2);
julia> @time A * A;
0.596825 seconds (1.05 M allocations: 53.458 MiB, 5.94% gc time, 99.95% compilation time) # nightly v"1.12.0-DEV.789"
0.473140 seconds (793.52 k allocations: 39.946 MiB, 3.28% gc time, 99.93% compilation time) # this PR
```
In a separate session,
```julia
julia> @time A * Symmetric(A);
0.829252 seconds (2.37 M allocations: 120.051 MiB, 1.98% gc time, 99.98% compilation time) # nightly v"1.12.0-DEV.789"
0.712953 seconds (2.06 M allocations: 103.951 MiB, 2.17% gc time, 99.98% compilation time) # This PR
```