add Half to cat fast path on CPU (#96078)
Extend current fast path on `cat` with `Half`: for non-arithmetic Ops, simply do `Vec::load` and `Vec::store`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96078
Approved by: https://github.com/jgong5, https://github.com/ezyang