deepcompile: Create dummy inputs using empty_strided (#7564)
CUDA tensors may have a larger storage than numel() * dtype.itemsize due
to alignment considerations. Creating dummy tensors by
torch.zero().as_strided() leads to out-of-bound errors in such cases.
Create dummy inputs by empty_strided().zero_() instead.
Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>