Fix illegal memory access with multi_tensor_apply size above INT_MAX (#7639)
This change fixes an overflow issue in `TensorListMetadata` where the
`sizes` array used `int` (32-bit signed integer). This caused incorrect
behavior (e.g., no parameter updates) when handling tensor sizes
exceeding `INT_MAX` (2^31 - 1).
The change here is identical to NVIDIA/apex PR
[#1825](https://github.com/NVIDIA/apex/pull/1825) for
`multi_tensor_apply.cuh`.
For further details regarding this fix, please refer to issue
[#7640](https://github.com/deepspeedai/DeepSpeed/issues/7640).
Signed-off-by: Wang Yan <wangyan.blaze@bytedance.com>