Merge ConstantOfShape and Cast in clean_graph optimization (#27482)
## Summary
- Resolve the TODO in `BertOnnxModel.clean_graph()` by folding the
`Cast` node into `ConstantOfShape`
- Update the `value` attribute dtype so `ConstantOfShape` directly
produces the target type
- Add first dedicated test suite for `clean_graph` with 3 tests
## Motivation
Fixes #27481
In the attention mask shape optimization path, `clean_graph()`
simplifies a `Concat → Unsqueeze → Gather` chain to a direct `Shape →
ConstantOfShape` connection, but leaves a redundant
`ConstantOfShape(float) → Cast(to=int64)` sequence. The existing TODO
(line 271) noted this should be merged.
## Changes
- **onnx_model_bert.py**: After the Concat path simplification, update
`ConstantOfShape`'s `value` attribute to the Cast target dtype using
`helper.tensor_dtype_to_np_dtype`, then redirect consumers from the Cast
output to the ConstantOfShape output. The Cast node is collected for
removal. (+2 imports: `numpy`, `numpy_helper`)
- **test_clean_graph.py** (new): 3 tests covering the merge (Cast
removed, ConstantOfShape produces int64), fill value preservation (0 →
int64(0)), and Concat path simplification (Concat/Gather pruned).
## Test Plan
- [x] 3 new tests pass: `python -m unittest
test_clean_graph.TestCleanGraph -v`
- [x] Existing `test_attention_fusion` tests pass
- [x] `ruff check` clean on both files