[MPS] Lazy initialize allocators (#78227)
Do not construct MPS allocators at load time, but rather create them
lazily when needed
This significantly reduces `libtorch.dylib` load time and prevents weird
flicker, when during import torch when Intel MacBook runs switches from
integrated to discrete graphics
Before the change `python3 -c "import timeit;import importlib;print(timeit.timeit(lambda: importlib.import_module('torch'), number=1))"` takes about 1 sec, after the change it drops down to .6 sec
Minor changes:
- Deleted unused `__block id<MTLBuffer> buf = nil;` from
HeapAllocatorImpl
- Add braces for single line if statements
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78227
Approved by: https://github.com/kulinseth, https://github.com/albanD