[quant][core][feature] Implement index_put for quantized CUDA tensors (#85685)
Summary:
- Add new cuda test for quantized index_put
- Add determinsitc test for CPU and CUDA quantized index_put
- Add in QuantizedCUDA implementation for index_put
- wrote new `index_put_kernel_quantized_cuda`
- CUDA index_put determinstic implemented in `index_put_with_sort_kernel_quantized`
I think quantize_val<scalar_t> is not CUDA compatible, because of the
reliance on std::numeric_limits. Might be something useful to add in the
future?
Test Plan:
```
python test/test_quantization.py -k test_qtensor_index_put
```
Reviewers:
Subscribers:
Tasks:
Tags: quant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85685
Approved by: https://github.com/dzdang