Add broadcastable check to index_put (#94849)
Copy-n-paste it from
https://github.com/pytorch/pytorch/blob/989299802cf83f8e3634b34028ecf08d76746307/aten/src/ATen/native/TensorAdvancedIndexing.cpp#L582-L583
Which is used for both CPU and CUDA checks, unless op is called for GPU with `deterministicAlgorithms()` set to true
Followup: do the same for XLA and fix the case when indices are not null
Fixes https://github.com/pytorch/pytorch/issues/94667
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94849
Approved by: https://github.com/ngimel