Improve `has_similar_generate_outputs` assertions (#44166)
This patch improves the `has_similar_generate_outputs` assertion, so we get full details when it fails.
Before (CI output):
```
AssertionError: False is not true
```
After (CI output would look like):
```
AssertionError: Generate outputs are not similar enough (atol=1e-05, rtol=1e-05).
Sequence mismatch: 3/20 tokens differ (first at position 12).
Batch index: 0
Token at mismatch — output_1: 1542, output_2: 8903
Score diff at first mismatch — max: 2.384186e-04, mean: 1.127943e-05
```
This will tell us immediately whether it's a tolerance issue (small diffs
suggesting we might just need a slightly larger atol), a completely different
generation path, or a specific batch element problem.