Various fixes to run on mps (#7767)
Attempting to run DeepSpeed (zero stage 0) with fp32 on MPS device. Ran
into a few issues, these fixes resolve them.
* No CUDA-like timer events on MPS, should fall back to host timers
* When abstract accelerator doesn't define a comm backend we shouldn't
trigger a broadcast or all-reduce