Add dims check to triangular mul (#56393)
This adds a dimension check to triangular matrix multiplication methods.
While such checks already exist in the individual branches (occasionally
within `BLAS` methods), having these earlier would permit certain
optimizations, as we are assured that the axes are compatible. This
potentially duplicates the checks, but this is unlikely to be a concern
given how cheap the checks are.
I've also reused the `check_A_mul_B!_sizes` function that is defined in
`bidiag.jl`, instead of hard-coding the checks.
Further, I've replaced some hard-coded loop ranges by the corresponding
`axes` and `first/lastindex` calls. These are identical under the
1-based indexing assumption, but the `axes` variants are easier to read
and reason about.