[wasm] Use relaxed SIMD dot product in CopyPackA (#25165)
### Description
This change replaces the previous zero-extend + 16-bit accumulation
sequence with a single wasm_i32x4_relaxed_dot_i8x16_i7x16_add operation
to compute row sums directly on 8-bit data.
### Motivation and Context
This update eliminates unpacking overhead and lifts the former
constraints on k stride.