Fix the bug in THCTensor_(baddbmm) and ATen's addmm_cuda for strided views input (#42425)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42418.
The problem was that the non-contiguous batched matrices were passed to `gemmStridedBatched`.
The following code fails on master and works with the proposed patch:
```python
import torch
x = torch.tensor([[1., 2, 3], [4., 5, 6]], device='cuda:0')
c = torch.as_strided(x, size=[2, 2, 2], stride=[3, 1, 1])
torch.einsum('...ab,...bc->...ac', c, c)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42425
Reviewed By: glaringlee
Differential Revision: D22925266
Pulled By: ngimel
fbshipit-source-id: a72d56d26c7381b7793a047d76bcc5bd45a9602c