DeepSpeed
660ee895 - deepcompile: Create dummy inputs using empty_strided (#7564)

Commit
101 days ago
deepcompile: Create dummy inputs using empty_strided (#7564) CUDA tensors may have a larger storage than numel() * dtype.itemsize due to alignment considerations. Creating dummy tensors by torch.zero().as_strided() leads to out-of-bound errors in such cases. Create dummy inputs by empty_strided().zero_() instead. Signed-off-by: Junjie Mao <junjie.mao@linux.alibaba.com>
Author
Parents
Loading