Fix bug in atomicAdd for int16_t (#29231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29231
Fixes: https://github.com/pytorch/pytorch/issues/29153
Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.
Incorrect behavior before this PR:
```
In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )
In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
[1, 1]]),
values=tensor([5, 6]),
device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
layout=torch.sparse_coo)
In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
[1]]),
values=tensor([11]),
device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
layout=torch.sparse_coo)
In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
[0, 0]], device='cuda:0', dtype=torch.int16)
In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
[ 0, 0]], device='cuda:0', dtype=torch.int16)
In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
[0, 0]], device='cuda:0', dtype=torch.int16)
```
Test Plan: Imported from OSS
Differential Revision: D18575666
Pulled By: nairbv
fbshipit-source-id: 9b193b386bf4a9615014aa890d2e9f4f694940ac