Experimental: allow fp16 in `mps` (#961)
* Docs: refer to pre-RC version of PyTorch 1.13.0.
* Remove temporary workaround for unavailable op.
* Update comment to make it less ambiguous.
* Remove use of contiguous in mps.
It appears to not longer be necessary.
* Special case: use einsum for much better performance in mps
* Update mps docs.
* MPS: make pipeline work in half precision.