Improve speed in combining per-channel data (#21563)
### Description
<!-- Describe your changes. -->
Improve speed in combining `per-channel` data for using a single
`np.concatenate` instead of multiple `np.concatenates` within a for
loop.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix the issue https://github.com/microsoft/onnxruntime/issues/21562
Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com>