[MPS] Add encoder coalescing support for native kernels (#99810)
Add support for kernel coalescing to native kernels.
This change reuses the same compute command encoder across successive metal kernel dispatches. The coalescing will stop when a graph op is encountered.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99810
Approved by: https://github.com/kulinseth