[MPS] Fix index_put with deterministic algorithm enabled (#97660)
Prevent using parallel computing when deterministic algorithm is set.
Fixes #97574
Benchmark:
```
[--------------- index_put_ Deterministic Algorithm Enabled ---------------]
| cpu | mps
1 threads: -----------------------------------------------------------------
Dtype: torch.float32 Features: 1024; Num Indices: 512 | 37 | 49
Dtype: torch.float32 Features: 1024; Num Indices: 1024 | 54 | 50
Dtype: torch.float32 Features: 1024; Num Indices: 2048 | 86 | 50
Dtype: torch.float32 Features: 1024; Num Indices: 4096 | 150 | 49
Times are in microseconds (us).
[-------------- index_put_ Deterministic Algorithm Disabled ---------------]
| cpu | mps
1 threads: -----------------------------------------------------------------
DType: torch.float32 Features: 1024; Num Indices: 512 | 37 | 49
DType: torch.float32 Features: 1024; Num Indices: 1024 | 53 | 49
DType: torch.float32 Features: 1024; Num Indices: 2048 | 86 | 49
DType: torch.float32 Features: 1024; Num Indices: 4096 | 147 | 50
Times are in microseconds (us).
```
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at ebf2ff3</samp>
Added a deterministic version of `index_put` for MPS tensors that runs on a single thread and can be enabled by a global context flag. Refactored the existing `index_put` function and the kernel selection logic to support both parallel and serial modes. Added a test function to verify the deterministic behavior of `index_put` under different conditions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97660
Approved by: https://github.com/kulinseth