[MPS] Add PSO caching for advanced indexing kernels (#99855)
Use bindless Argument Buffers (unbounded arrays) for advanced indexing kernels - this allows caching of the PSOs since we don't have to query anymore the main metal function for the AB size (this is filled directly now on the CPU).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99855
Approved by: https://github.com/kulinseth