[MLIR][Arith] Fix index_cast/index_castui chain folding to check intermediate width (#189042)
The patterns `IndexCastOfIndexCast` and `IndexCastUIOfIndexCastUI` in
ArithCanonicalization.td incorrectly eliminated a pair of index casts
whenever the outer result type equalled the original source type,
without verifying that the intermediate cast was lossless.
For example, the following was wrong folded to `%arg0`:
%0 = index_castui %arg0 : i64 to index
%1 = index_castui %0 : index to i8 ← truncates to 8 bits
%2 = index_castui %1 : i8 to index ← incorrectly removed
The pattern matched `%1`/`%2` because `i8.to(index)` has the same result
type as `i64.to(index)`, even though the i8 intermediate silently drops
56 bits. The same bug existed for the signed `index_cast` variant.
Fix: move the optimization into the `fold` methods of `IndexCastOp` and
`IndexCastUIOp` with an explicit check that the intermediate type is at
least as wide as the source type (using
`IndexType::kInternalStorageBitWidth` as the representative width for
`index`). Only then is the round-trip guaranteed lossless and the chain
can be collapsed.
Fixes #90238
Fixes #90296
Assisted-by: Claude Code