[MLIR] Minor fixes to FoldTransposeBroadcast rewrite (#140083)
This patch contains two minor changes, which I believe were the original
author's intent.
* when folding `transpose(broadcast(x))` emit `broadcast(x)` instead of
`broadcast(broadcast(x))`. The latter causes transient verifier
failures with `mlir-opt --debug` , e.g.
```
mlir-asm-printer: 'func.func' failed to verify and will be printed in generic form
"func.func"() <{function_type = (vector<4x1x1x7xi8>) -> vector<3x2x4x5x6x7xi8>, sym_name = "broadcast_transpose_mixed_example"}> ({
^bb0(%arg0: vector<4x1x1x7xi8>):
%0 = "vector.broadcast"(%arg0) : (vector<4x1x1x7xi8>) -> vector<2x3x4x5x6x7xi8>
%1 = "vector.broadcast"(%0) : (vector<2x3x4x5x6x7xi8>) -> vector<3x2x4x5x6x7xi8>
"func.return"(%1) : (vector<3x2x4x5x6x7xi8>) -> ()
}) : () -> ()
```
* when checking permutation groups the variable `low` was set just once
to zero, thus checking was quadratic. It looks the intent was for `low`
to track the beginning of each dimension groups. (Nevertheless the check
was correct).