Avoid using zero for the eltype in `tr(::Matrix)` (#55519)
This lets us compute the `tr` for `Matrix`es where the `eltype` does not
have a zero, but we may sum over the diagonal.
E.g. the following works after this:
```julia
julia> M = fill([1 2; 3 4], 2, 2)
2×2 Matrix{Matrix{Int64}}:
[1 2; 3 4] [1 2; 3 4]
[1 2; 3 4] [1 2; 3 4]
julia> tr(M)
2×2 Matrix{Int64}:
2 4
6 8
```
Also, using linear indexing over Cartesian appears to provide a slight
speed-up for small to mid-sized matrices:
```julia
julia> A = rand(1000,1000);
julia> @btime tr($A);
1.796 μs (0 allocations: 0 bytes) # nightly
1.524 μs (0 allocations: 0 bytes) # This PR
```