fixing contiguous on broadcasted dimension (#1519)
Fixes #1514
Issue arises when we try to collapse a dimension to a broadcasted dimension. PyTorch mark all dimension with stride 1 to be contiguous. This used to be fine until recently we changed it that broadcasted dimension with stride 0 can now be put as a faster dimension. So now we can be collapsing a normal dimension into a broadcasted dimension and our index is wrong, since broadcasted dimension has stride 0.
The solution is to explicitly check inner dimension stride for contiguous dimension and only collapse when inner dimension is not broadcasted.